Grades of copper from 5 soil samples were measured: 24, 28, 22, 23 and 24 ppm. What is the distribution of the values of these grades? A histogram could look as follows:
Figure 2.1 is already very interesting, however it becomes more informative if we have more data at hand. E.g. 80 sample values are shown in Figure 2.2. The average of all the values may be seen to be approximately 22 ppm.
The measured variable (grade of copper) is a continuous one.
Practically, the values are registered in a discrete manner. For
displaying the frequencies in a histogram, the values have to be
classified anyways. The width of classes is an open question. If
grade is measured up to
ppm then a simple histogram might look like in
Figure 2.3.
In theory, one could increase the number of classes more and more, in accord to the increment of the number of sample values, such that the contours of the histograms tend to a smooth curve as shown in Figure 2.4.
The frequencies in the histograms were given in absolute values.
Of course, it is also senseful to show the scaling of relative
frequencies, which simply means the division of each absolute
frequency by the total number of samples.
Sometimes a data value is equal to a class limit which provokes a
necessary decision of what class the value should belong to. In
our example, the ``observer'' (the person who analyses the grade)
was told to register the number up to a tenth , and in the
case that a value is very close to a half of
, to add a sign
+ or - to the value if it were just above or below, respectively.
The registered values are reproduced in Table
2.1.
28.3 | 25.4 | 27.0 | 25.5- | 20.9 | 24.0 | 25.1 | 22.2 | 24.8 | 25.1 |
23.5- | 24.6 | 26.1 | 24.7 | 26.5- | 27.5- | 25.6 | 22.9 | 25.5+ | 23.5- |
23.9 | 26.5+ | 24.5- | 24.3 | 24.5+ | 27.0 | 27.3 | 25.0 | 22.8 | 23.5 |
26.4 | 27.1 | 23.4 | 24.1 | 26.7 | 24.9 | 23.5+ | 27.4 | 25.5+ | 22.8 |
25.1 | 24.7 | 26.3 | 21.8 | 23.2 | 24.3 | 24.5- | 26.0 | 24.1 | 27.5 |
In practice, with pencil and scratch paper, one would not draw a histogram but try to produce a list of frequencies ``tallying'' by putting slashes for a value (a simple stroke per item) in a certain class, or by using up to 10 symbols (dots, boxlines, crossed lines) (see Tukey, 1977[22]). The presentations in the following avoid the complication of rounding up or down. A number simply is truncated in order to find the class which it belongs to. At the same time, it is attempted to show as much information as possible in each class.