next up previous contents
Next: Characteristic Parameters of a Up: Graphical Presentation Previous: Histogram   Contents


Stem and Leaf Display

If the data contain more information than the membership to a class (as in the histogram) it makes sense to use more sophisticated symbols than slashes and boxes, namely digits. These could then be better used for secondary statistics.

Example 2.1: (Marsal, 1979[15]): Distribution of porosity (portion in percent of pores in rock) in sand stone. 57 values (in %) were measured along a partial profile of a drill hole (see Table 2.2).


Table 2.2: Values of Porosity in Sand Stone.
22.1 23.5 25.3 26.6 23.9 26.0 22.8 22.3 23.1 23.0 21.0 21.8
22.0 22.2 22.3 22.4 22.4 22.4 22.3 21.6 22.1 22.6 22.1 21.9
22.3 23.9 23.2 22.5 23.7 23.3 24.4 22.6 23.9 24.2 27.6 27.9
25.2 21.7 20.0 19.8 21.5 25.6 25.3 24.1 28.6 23.7 24.0 21.8
24.9 24.2 25.0 23.7 27.3 23.0 23.8 21.2 21.1      

The distribution of the values should be presented by a stem and leaf display. It could look like Table 2.3.



Table 2.3: Porosity of Sand Stone (Unit = .1).
    $h_{i}$ $f_{i}$ Cumulation
      rel. abs. %
19 8 (1) .018 1 1.8
20 0 (1) .018 2 3.6
21 756210898 (9) .158 11 19.3
22 1032345448336161 (16) .281 27 47.4
23 5927973081907 (13) .228 40 70.2
24 924120 (6) .105 46 80.7
25 23063 (5) .088 51 89.5
26 60 (2) .035 53 93.0
27 369 (3) .053 56 98.2
28 6 (1) .018 57 100.0
    (57 $/surd$)      

In each row, in the first column we find the identification of the ``stem'', which together with a digit, the ``leaf'', represents a value which has been truncated after the ``leaf''. This number, e.g. 198, has to be seen in the given units, e.g. .1. Table 2.2 should therefore be read in the way that the number 198 (stem 19, leaf 8) means 198 $/times$ unit = 198 $/times$ .1 = 19.8, etc. The column annotated by ``$h_{i}$'' shows the absolute frequency of the leafs within stems, the column ``$f_{i}$'' the relative ones. The last two columns show the cumulative frequencies. The values at the right (frequencies) correspond to the classical frequency table. A graphical cumulative frequency polygon with these data is shown in Figure 2.5. Often in practice it is not so easy to find a unique presentation of a batch of numbers in form of a stem and leaf display. E.g. we take the complete data set of grades of copper of soil samples which were partially used at the beginning of the chapter. The complete data are reproduced in Table 2.7 (Subsection 2.2.7). A quite original computer program (Velleman and Hoaglin, 1981[23]) produce the output shown in Table 2.4. We remark that it is not reasonable to use stems in integer numbers from 11 to 41 in order to get a semi-graphic of the distribution. In this case it seems advantageous to present in each stem only two different leaves of unit 1 ppm.

Figure 2.5: Cumulation Frequency of Porosity in Sand Stone.
/begin{figure}/begin{center}
/mbox
{/beginpicture
/setcoordinatesystem units <.9...
...02
25 .807
26 .895
27 .93
28 .982
29 1.0 /
/endpicture}
/end{center}/end{figure}


Table 2.4: Grades of Copper in ppm.
/begin{table}/centering/scriptsize {/begin{minipage}[t]{8.5cm}
/begin{verbatim}
...
...6 3T 33
4 3F 5
3 3S 6
2 3. 9
1 4* 1/end{verbatim}/end{minipage}}
/end{table}


The listed leafs are already ordered. The stems should be identified by at least two digits or symbols. Therefore the distinction of the stems is made by characters taken from the English language (see Tukey, 1977[22]):

0 and 1 is $/ast$,
2 and 3 is T,
4 and 5 is F,
6 and 7 is S,
8 and 9 is . .
The first column of Table 2.4 shows the depth of the data values, in detail, the cumulative frequency of each stem, however, computed from each end of the data set. The (absolute) frequency in the middle stem is set in parentheses. Quite often, data may contain extremely high or low values which do not fit the picture of the rest of the data. A usual histogram, and also the described stem and leaf display may become very uninformative because a very few classes may contain most of the data values. The solution is in the special treatment of the outliers, say, we take them out and display them separately on the stem and leaf display. The output in Table 2.5 illustrates the data of nickel of Table 2.7. We remark that the relatively high values (e.g. 463 and 472 ppm) provoke a concentration of a third of the values in just one stem. The width of a stem changes here is 50 units. The second presentation shows the distribution without the extreme values. These are noted down after the key word HI. The width of a stem changes to 20 units. Remark that the annotation of the first stem contains a ``+''-sign because there is also the possibility of having stems with the annotation ``-0''. This results from truncation.

Stem and leaf displays help to find structures in the data without having formulated any hypothesis. In particular, they can help finding quickly characteristic parameters of the shown distribution.


Table 2.5: Grades of Nickel [ppm], 2 Definitions of Stems.
/begin{table}/centering/begin{center}
/scriptsize {/begin{minipage}[t]{10.5cm}
/...
... 40, 41, 43, 43, 44, 46, 47/end{verbatim}/end{minipage}}
/end{center}/end{table}



next up previous contents
Next: Characteristic Parameters of a Up: Graphical Presentation Previous: Histogram   Contents
Rudolf Dutter 2003-03-13