If the data contain more information than the membership to a class (as in the histogram) it makes sense to use more sophisticated symbols than slashes and boxes, namely digits. These could then be better used for secondary statistics.
Example
2.1: (Marsal, 1979[15]): Distribution of porosity (portion in
percent of pores in rock) in sand stone. 57 values (in %) were
measured along a partial profile of a drill hole (see Table
2.2).
22.1 | 23.5 | 25.3 | 26.6 | 23.9 | 26.0 | 22.8 | 22.3 | 23.1 | 23.0 | 21.0 | 21.8 |
22.0 | 22.2 | 22.3 | 22.4 | 22.4 | 22.4 | 22.3 | 21.6 | 22.1 | 22.6 | 22.1 | 21.9 |
22.3 | 23.9 | 23.2 | 22.5 | 23.7 | 23.3 | 24.4 | 22.6 | 23.9 | 24.2 | 27.6 | 27.9 |
25.2 | 21.7 | 20.0 | 19.8 | 21.5 | 25.6 | 25.3 | 24.1 | 28.6 | 23.7 | 24.0 | 21.8 |
24.9 | 24.2 | 25.0 | 23.7 | 27.3 | 23.0 | 23.8 | 21.2 | 21.1 |
![]() |
![]() |
Cumulation | |||
rel. | abs. | % | |||
19 | 8 | (1) | .018 | 1 | 1.8 |
20 | 0 | (1) | .018 | 2 | 3.6 |
21 | 756210898 | (9) | .158 | 11 | 19.3 |
22 | 1032345448336161 | (16) | .281 | 27 | 47.4 |
23 | 5927973081907 | (13) | .228 | 40 | 70.2 |
24 | 924120 | (6) | .105 | 46 | 80.7 |
25 | 23063 | (5) | .088 | 51 | 89.5 |
26 | 60 | (2) | .035 | 53 | 93.0 |
27 | 369 | (3) | .053 | 56 | 98.2 |
28 | 6 | (1) | .018 | 57 | 100.0 |
(57 ![]() |
In each row, in the first column we find the identification of the
``stem'', which together with a digit, the
``leaf'', represents a value which has been truncated after
the
``leaf''. This number, e.g. 198, has to be seen in the given
units, e.g. .1. Table
2.2 should therefore be read in the way that the number
198 (stem 19, leaf 8) means 198 unit =
198
.1 = 19.8, etc. The column annotated by
``
'' shows the absolute frequency of the leafs within
stems, the column
``
'' the relative ones. The last two columns show the
cumulative frequencies. The values at the right (frequencies)
correspond to the classical frequency table. A graphical
cumulative frequency polygon with these data is shown in Figure
2.5.
Often in practice it is not so easy to find a unique presentation
of a batch of numbers in form of a stem and leaf display. E.g. we
take the complete data set of grades of copper of soil samples which
were partially used at the beginning of the chapter. The complete
data are reproduced in Table
2.7 (Subsection 2.2.7). A quite original computer
program
(Velleman and Hoaglin, 1981[23]) produce the output
shown in Table
2.4.
We remark that it is not reasonable to use stems in integer numbers
from 11 to 41 in order to get a semi-graphic of the
distribution. In this case it seems advantageous to present in each
stem only two different leaves of unit 1 ppm.
The listed leafs are already ordered. The stems should be identified by at least two digits or symbols. Therefore the distinction of the stems is made by characters taken from the English language (see Tukey, 1977[22]):
0 | and | 1 | is | ![]() |
2 | and | 3 | is | T, |
4 | and | 5 | is | F, |
6 | and | 7 | is | S, |
8 | and | 9 | is | . . |
Stem and leaf displays help to find structures in the data without having formulated any hypothesis. In particular, they can help finding quickly characteristic parameters of the shown distribution.