This is “Measures of Variability”, section 2.3 from the book Beginning Statistics (v. 1.0). For details on it (including licensing), click here.
For more information on the source of this book, or why it is available for free, please see the project's home page. You can browse or download additional books there. To download a .zip file containing this book to use offline, simply click here.
Look at the two data sets in Table 2.1 "Two Data Sets" and the graphical representation of each, called a dot plot, in Figure 2.10 "Dot Plots of Data Sets".
Table 2.1 Two Data Sets
Data Set I: | 40 | 38 | 42 | 40 | 39 | 39 | 43 | 40 | 39 | 40 |
Data Set II: | 46 | 37 | 40 | 33 | 42 | 36 | 40 | 47 | 34 | 45 |
Figure 2.10 Dot Plots of Data Sets
The two sets of ten measurements each center at the same value: they both have mean, median, and mode 40. Nevertheless a glance at the figure shows that they are markedly different. In Data Set I the measurements vary only slightly from the center, while for Data Set II the measurements vary greatly. Just as we have attached numbers to a data set to locate its center, we now wish to associate to each data set numbers that measure quantitatively how the data either scatter away from the center or cluster close to it. These new quantities are called measures of variability, and we will discuss three of them.
The first measure of variability that we discuss is the simplest.
The rangeThe variability of a data set as measured by the number $R={x}_{\text{max}}-{x}_{\text{min}}.$ of a data set is the number R defined by the formula
$$R={x}_{\text{max}}-{x}_{\text{min}}$$where ${x}_{\text{max}}$ is the largest measurement in the data set and ${x}_{\text{min}}$ is the smallest.
Find the range of each data set in Table 2.1 "Two Data Sets".
Solution:
For Data Set I the maximum is 43 and the minimum is 38, so the range is $R=43-38=5.$
For Data Set II the maximum is 47 and the minimum is 33, so the range is $R=47-33=14.$
The range is a measure of variability because it indicates the size of the interval over which the data points are distributed. A smaller range indicates less variability (less dispersion) among the data, whereas a larger range indicates the opposite.
The other two measures of variability that we will consider are more elaborate and also depend on whether the data set is just a sample drawn from a much larger population or is the whole population itself (that is, a census).
The sample variance of a set of n sample data is the number s^{2} defined by the formula
$${s}^{2}=\frac{\mathrm{\Sigma}{\left(x-\stackrel{-}{x}\right)}^{2}}{n\text{\u2212}1}$$which by algebra is equivalent to the formula
$${s}^{2}=\frac{\mathrm{\Sigma}{x}^{2}-\frac{1}{n}{\left(\mathrm{\Sigma}x\right)}^{2}}{n\text{\u2212}1}$$The sample standard deviationThe variability of sample data as measured by the number $\sqrt{\frac{\mathrm{\Sigma}{(x-\stackrel{-}{x})}^{2}}{n\text{\u2212}1}}$. of a set of n sample data is the square root of the sample variance, hence is the number s given by the formulas
$$s=\sqrt{\frac{\mathrm{\Sigma}{\left(x-\stackrel{-}{x}\right)}^{2}}{n\text{\u2212}1}}=\sqrt{\frac{\mathrm{\Sigma}{x}^{2}-\frac{1}{n}{\left(\mathrm{\Sigma}x\right)}^{2}}{n\text{\u2212}1}}$$Although the first formula in each case looks less complicated than the second, the latter is easier to use in hand computations, and is called a shortcut formula.
Find the sample variance and the sample standard deviation of Data Set II in Table 2.1 "Two Data Sets".
Solution:
To use the defining formula (the first formula) in the definition we first compute for each observation x its deviation $x-\stackrel{-}{x}$ from the sample mean. Since the mean of the data is $\stackrel{-}{x}=40$, we obtain the ten numbers displayed in the second line of the supplied table.
$$\begin{array}{ccccccccccc}\hfill x\hfill & \hfill 46\hfill & \hfill 37\hfill & \hfill 40\hfill & \hfill 33\hfill & \hfill 42\hfill & \hfill 36\hfill & \hfill 40\hfill & \hfill 47\hfill & \hfill 34\hfill & \hfill 45\hfill \\ \hfill x-\stackrel{-}{x}\hfill & \hfill 6\hfill & \hfill -3\hfill & \hfill 0\hfill & \hfill -7\hfill & \hfill 2\hfill & \hfill -4\hfill & \hfill 0\hfill & \hfill 7\hfill & \hfill -6\hfill & \hfill 5\hfill \end{array}$$Then
$$\mathrm{\Sigma}{\left(x-\stackrel{-}{x}\right)}^{2}={6}^{2}+{\left(\text{\u2212}3\right)}^{2}+{0}^{2}+{\left(\text{\u2212}7\right)}^{2}+{2}^{2}+{\left(\text{\u2212}4\right)}^{2}+{0}^{2}+{7}^{2}+{\left(\text{\u2212}6\right)}^{2}+{5}^{2}=224$$so
$${s}^{2}=\frac{\mathrm{\Sigma}{\left(x-\stackrel{-}{x}\right)}^{2}}{n\text{\u2212}1}=\frac{224}{9}=24.\stackrel{-}{8}$$and
$$s=\sqrt{24.\stackrel{-}{8}}\approx 4.99$$The student is encouraged to compute the ten deviations for Data Set I and verify that their squares add up to 20, so that the sample variance and standard deviation of Data Set I are the much smaller numbers ${s}^{2}=20\u22159=2.\stackrel{-}{2}$ and $s=\sqrt{20\u22159}\approx 1.49.$
Find the sample variance and the sample standard deviation of the ten GPAs in Note 2.12 "Example 3" in Section 2.2 "Measures of Central Location".
$$\begin{array}{cccccccccc}\hfill 1.90\hfill & \hfill 3.00\hfill & \hfill 2.53\hfill & \hfill 3.71\hfill & \hfill 2.12\hfill & \hfill 1.76\hfill & \hfill 2.71\hfill & \hfill 1.39\hfill & \hfill 4.00\hfill & \hfill 3.33\hfill \end{array}$$Solution:
Since
$$\mathrm{\Sigma}x=1.90+3.00+2.53+3.71+2.12+1.76+2.71+1.39+4.00+3.33=26.45$$and
$$\begin{array}{ccc}\hfill {\mathrm{\Sigma}}^{\text{}}{x}^{2}& =& {1.90}^{2}+{3.00}^{2}+{2.53}^{2}+{3.71}^{2}+{2.12}^{2}+{1.76}^{2}\hfill \\ & & \hfill +{2.71}^{2}+{1.39}^{2}+{4.00}^{2}+{3.33}^{2}\\ & =& 76.7321\hfill \end{array}$$the shortcut formula gives
$${s}^{2}=\frac{\mathrm{\Sigma}{x}^{2}-\frac{1}{n}{\left(\mathrm{\Sigma}x\right)}^{2}}{n\text{\u2212}1}=\frac{76.7321-\frac{{\left(26.45\right)}^{2}}{10}}{10-1}=\frac{6.77185}{9}=.75242\stackrel{-}{7}$$and
$$s=\sqrt{.75242\stackrel{-}{7}}\approx .867$$The sample variance has different units from the data. For example, if the units in the data set were inches, the new units would be inches squared, or square inches. It is thus primarily of theoretical importance and will not be considered further in this text, except in passing.
If the data set comprises the whole population, then the population standard deviation, denoted σ (the lower case Greek letter sigma), and its square, the population variance σ^{2}, are defined as follows.
The population variance and population standard deviationThe variability of population data as measured by the number ${\mathit{\sigma}}^{2}=\frac{\mathrm{\Sigma}{(x-\mathit{\mu})}^{2}}{N}$. of a set of N population data are the numbers σ^{2} and σ defined by the formulas
$${\mathit{\sigma}}^{2}=\frac{\mathrm{\Sigma}{\left(x-\mathit{\mu}\right)}^{2}}{N}\text{\hspace{1em}}and\text{\hspace{1em}}\mathit{\sigma}=\sqrt{\frac{\mathrm{\Sigma}{\left(x-\mathit{\mu}\right)}^{2}}{N}}$$Note that the denominator in the fraction is the full number of observations, not that number reduced by one, as is the case with the sample standard deviation. Since most data sets are samples, we will always work with the sample standard deviation and variance.
Finally, in many real-life situations the most important statistical issues have to do with comparing the means and standard deviations of two data sets. Figure 2.11 "Difference between Two Data Sets" illustrates how a difference in one or both of the sample mean and the sample standard deviation are reflected in the appearance of the data set as shown by the curves derived from the relative frequency histograms built using the data.
Figure 2.11 Difference between Two Data Sets
The range, the standard deviation, and the variance each give a quantitative answer to the question “How variable are the data?”
Find the range, the variance, and the standard deviation for the following sample.
$$1\text{\hspace{1em}}2\text{\hspace{1em}}3\text{\hspace{1em}}4$$Find the range, the variance, and the standard deviation for the following sample.
$$2\text{\hspace{1em}}\text{\u2212}3\text{\hspace{1em}}6\text{\hspace{1em}}0\text{\hspace{1em}}3\text{\hspace{1em}}1$$Find the range, the variance, and the standard deviation for the following sample.
$$2\text{\hspace{1em}}1\text{\hspace{1em}}2\text{\hspace{1em}}7$$Find the range, the variance, and the standard deviation for the following sample.
$$\text{\u2212}1\text{\hspace{1em}}0\text{\hspace{1em}}1\text{\hspace{1em}}4\text{\hspace{1em}}1\text{\hspace{1em}}1$$Find the range, the variance, and the standard deviation for the sample represented by the data frequency table.
$$\begin{array}{cccc}\hfill x\hfill & \hfill 1& \hfill 2& \hfill 7\\ \hfill f\hfill & \hfill 1& \hfill 2& \hfill 1\end{array}$$Find the range, the variance, and the standard deviation for the sample represented by the data frequency table.
$$\begin{array}{ccccc}\hfill x\hfill & \hfill \text{\u2212}1& \hfill 0& \hfill 1& \hfill 4\\ \hfill f\hfill & \hfill 1& \hfill 1& \hfill 3& \hfill 1\end{array}$$Find the range, the variance, and the standard deviation for the sample of ten IQ scores randomly selected from a school for academically gifted students.
$$\begin{array}{ccccc}132& 162& 133& 145& 148\\ 139& 147& 160& 150& 153\end{array}$$Find the range, the variance and the standard deviation for the sample of ten IQ scores randomly selected from a school for academically gifted students.
$$\begin{array}{ccccc}142& 152& 138& 145& 148\\ 139& 147& 155& 150& 153\end{array}$$Consider the data set represented by the table
$$\begin{array}{cccccccc}\hfill x\hfill & \hfill 26\hfill & \hfill 27\hfill & \hfill 28\hfill & \hfill 29\hfill & \hfill 30\hfill & \hfill 31\hfill & \hfill 32\hfill \\ \hfill f\hfill & \hfill 3\hfill & \hfill 4\hfill & \hfill 16\hfill & \hfill 12\hfill & \hfill 6\hfill & \hfill 2\hfill & \hfill 1\hfill \end{array}$$Find the sample standard deviation for the data
$$\begin{array}{cccccc}\hfill x\hfill & \hfill 1\hfill & \hfill 2\hfill & \hfill 3\hfill & \hfill 4\hfill & \hfill 5\hfill \\ \hfill f\hfill & \hfill 384\hfill & \hfill 208\hfill & \hfill 98\hfill & \hfill 56\hfill & \hfill 28\hfill \end{array}$$ $$\begin{array}{cccccc}\hfill x\hfill & \hfill 6\hfill & \hfill 7\hfill & \hfill 8\hfill & \hfill 9\hfill & \hfill 10\hfill \\ \hfill f\hfill & \hfill 12\hfill & \hfill 8\hfill & \hfill 2\hfill & \hfill 3\hfill & \hfill 1\hfill \end{array}$$A random sample of 49 invoices for repairs at an automotive body shop is taken. The data are arrayed in the stem and leaf diagram shown. (Stems are thousands of dollars, leaves are hundreds, so that for example the largest observation is 3,800.)
$$\begin{array}{cccccccccccc}3& 5& 6& 8& & & & & & & & \\ 3& 0& 0& 1& 1& 2& 4& & & & & \\ 2& 5& 6& 6& 7& 7& 8& 8& 9& 9& & \\ 2& 0& 0& 0& 0& 1& 2& 2& 4& & & \\ 1& 5& 5& 5& 6& 6& 7& 7& 7& 8& 8& 9\\ 1& 0& 0& 1& 3& 4& 4& 4& & & & \\ 0& 5& 6& 8& 8& & & & & & & \\ 0& 4& & & & & & & & & & \end{array}$$For these data, $\mathrm{\Sigma}x=\mathrm{101,100}$, $\mathrm{\Sigma}{x}^{2}=\mathrm{244,830,000}.$
What must be true of a data set if its standard deviation is 0?
A data set consisting of 25 measurements has standard deviation 0. One of the measurements has value 17. What are the other 24 measurements?
Create a sample data set of size n = 3 for which the range is 0 and the sample mean is 2.
Create a sample data set of size n = 3 for which the sample variance is 0 and the sample mean is 1.
The sample $\{\text{\u2212}\mathrm{1,0,1}\}$ has mean $\stackrel{-}{x}=0$ and standard deviation s = 1. Create a sample data set of size n = 3 for which $\stackrel{-}{x}=0$ and s is greater than 1.
The sample $\{\text{\u2212}\mathrm{1,0,1}\}$ has mean $\stackrel{-}{x}=0$ and standard deviation s = 1. Create a sample data set of size n = 3 for which $\stackrel{-}{x}=0$ and the standard deviation s is less than 1.
Begin with the following set of data, call it Data Set I.
$$\begin{array}{ccccccccccc}5& \text{\u2212}2& 6& 14& \text{\u2212}3& 0& 1& 4& 3& 2& 5\end{array}$$Large Data Set 1 lists the SAT scores and GPAs of 1,000 students.
http://www.flatworldknowledge.com/sites/all/files/data1.xls
Large Data Set 1 lists the SAT scores of 1,000 students.
http://www.flatworldknowledge.com/sites/all/files/data1.xls
Large Data Set 1 lists the GPAs of 1,000 students.
http://www.flatworldknowledge.com/sites/all/files/data1.xls
Large Data Sets 7, 7A, and 7B list the survival times in days of 140 laboratory mice with thymic leukemia from onset to death.
http://www.flatworldknowledge.com/sites/all/files/data7.xls
http://www.flatworldknowledge.com/sites/all/files/data7A.xls
http://www.flatworldknowledge.com/sites/all/files/data7B.xls
R = 3, s^{2} = 1.7, s = 1.3.
R = 6, ${s}^{2}=7.\stackrel{-}{3}$, s = 2.7.
R = 6, s^{2} = 7.3, s = 2.7.
R = 30, s^{2} = 103.2, s = 10.2.
$\stackrel{-}{x}=28.55$, s = 1.3.
All are 17.
{1,1,1}
One example is $\left\{\text{\u2212}.\mathrm{5,0},.5\right\}.$