Small Sample Estimation of a Population Mean

7.2 Small Sample Estimation of a Population Mean

Learning Objectives

To become familiar with Student’s t-distribution.
To understand how to apply additional formulas for a confidence interval for a population mean.

The confidence interval formulas in the previous section are based on the Central Limit Theorem, the statement that for large samples $\bar{X}$ is normally distributed with mean μ and standard deviation $σ ∕ \sqrt{n} .$ When the population mean μ is estimated with a small sample (n < 30), the Central Limit Theorem does not apply. In order to proceed we assume that the numerical population from which the sample is taken has a normal distribution to begin with. If this condition is satisfied then when the population standard deviation σ is known the old formula $\bar{x} \pm z_{α ∕ 2} (σ ∕ \sqrt{n})$ can still be used to construct a $100 (1 - α)$ % confidence interval for μ.

If the population standard deviation is unknown and the sample size n is small then when we substitute the sample standard deviation s for σ the normal approximation is no longer valid. The solution is to use a different distribution, called Student’s t-distributionA distribution of a continuous random variable that resembles that standard normal distribution but has heavier tails. with $n − 1$ degrees of freedomA number that specifies a particular t-distribution and that is computed based on the sample size.. Student’s t-distribution is very much like the standard normal distribution in that it is centered at 0 and has the same qualitative bell shape, but it has heavier tails than the standard normal distribution does, as indicated by Figure 7.5 "Student’s ", in which the curve (in brown) that meets the dashed vertical line at the lowest point is the t-distribution with two degrees of freedom, the next curve (in blue) is the t-distribution with five degrees of freedom, and the thin curve (in red) is the standard normal distribution. As also indicated by the figure, as the sample size n increases, Student’s t-distribution ever more closely resembles the standard normal distribution. Although there is a different t-distribution for every value of n, once the sample size is 30 or more it is typically acceptable to use the standard normal distribution instead, as we will always do in this text.

Figure 7.5 Student’s t-Distribution

Just as the symbol z_c stands for the value that cuts off a right tail of area c in the standard normal distribution, so the symbol t_c stands for the value that cuts off a right tail of area c in the standard normal distribution. This gives us the following confidence interval formulas.

Small Sample $100 (1 - α) %$ Confidence Interval for a Population Mean

If σ is known: $\bar{x} \pm z_{α ∕ 2} (\frac{σ}{\sqrt{n}})$

If σ is unknown: $\bar{x} \pm t_{α ∕ 2} (\frac{s}{\sqrt{n}})$ (degrees of freedom $d f = n − 1$ )

The population must be normally distributed.

A sample is considered small when n < 30.

To use the new formula we use the line in Figure 12.3 "Critical Values of " that corresponds to the relevant sample size.

Example 5

A sample of size 15 drawn from a normally distributed population has sample mean 35 and sample standard deviation 14. Construct a 95% confidence interval for the population mean, and interpret its meaning.

Solution:

Since the population is normally distributed, the sample is small, and the population standard deviation is unknown, the formula that applies is

\bar{x} \pm t_{α ∕ 2} (\frac{s}{\sqrt{n}})

Confidence level 95% means that $α = 1 - 0.95 = 0.05$ so $α ∕ 2 = 0.025 .$ Since the sample size is n = 15, there are $n − 1 = 14$ degrees of freedom. By Figure 12.3 "Critical Values of " $t_{0.025} = 2.145 .$ Thus

\bar{x} \pm t_{α ∕ 2} (\frac{s}{\sqrt{n}}) = 35 \pm 2.145 (\frac{14}{\sqrt{15}}) = 35 \pm 7.8

One may be 95% confident that the true value of μ is contained in the interval $(35 - 7 . 8,35 + 7.8) = (27 . 2,42 . 8) .$

Example 6

A random sample of 12 students from a large university yields mean GPA 2.71 with sample standard deviation 0.51. Construct a 90% confidence interval for the mean GPA of all students at the university. Assume that the numerical population of GPAs from which the sample is taken has a normal distribution.

Solution:

Since the population is normally distributed, the sample is small, and the population standard deviation is unknown, the formula that applies is

\bar{x} \pm t_{α ∕ 2} (\frac{s}{\sqrt{n}})

Confidence level 90% means that $α = 1 - 0.90 = 0.10$ so $α ∕ 2 = 0.05 .$ Since the sample size is n = 12, there are $n − 1 = 11$ degrees of freedom. By Figure 12.3 "Critical Values of " $t_{0.05} = 1.796 .$ Thus

\bar{x} \pm t_{α ∕ 2} (\frac{s}{\sqrt{n}}) = 2.71 \pm 1.796 (\frac{0.51}{\sqrt{12}}) = 2.71 \pm 0.26

One may be 90% confident that the true average GPA of all students at the university is contained in the interval $(2.71 - 0 . 26,2 . 71 + 0.26) = (2 . 45,2 . 97) .$

Compare Note 7.9 "Example 4" in Section 7.1 "Large Sample Estimation of a Population Mean" and Note 7.16 "Example 6". The summary statistics in the two samples are the same, but the 90% confidence interval for the average GPA of all students at the university in Note 7.9 "Example 4" in Section 7.1 "Large Sample Estimation of a Population Mean", $(2 . 63,2 . 79)$ , is shorter than the 90% confidence interval $(2 . 45,2 . 97)$ , in Note 7.16 "Example 6". This is partly because in Note 7.9 "Example 4" the sample size is larger; there is more information pertaining to the true value of μ in the large data set than in the small one.

Key Takeaways

In selecting the correct formula for construction of a confidence interval for a population mean ask two questions: is the population standard deviation σ known or unknown, and is the sample large or small?
We can construct confidence intervals with small samples only if the population is normal.

Exercises

Basic

A random sample is drawn from a normally distributed population of known standard deviation 5. Construct a 99.8% confidence interval for the population mean based on the information given (not all of the information given need be used).
1. n = 16, $\bar{x} = 98$ , s = 5.6
2. n = 9, $\bar{x} = 98$ , s = 5.6
A random sample is drawn from a normally distributed population of known standard deviation 10.7. Construct a 95% confidence interval for the population mean based on the information given (not all of the information given need be used).
1. n = 25, $\bar{x} = 103.3$ , s = 11.0
2. n = 4, $\bar{x} = 103.3$ , s = 11.0
A random sample is drawn from a normally distributed population of unknown standard deviation. Construct a 99% confidence interval for the population mean based on the information given.
1. n = 18, $\bar{x} = 386$ , s = 24
2. n = 7, $\bar{x} = 386$ , s = 24
A random sample is drawn from a normally distributed population of unknown standard deviation. Construct a 98% confidence interval for the population mean based on the information given.
1. n = 8, $\bar{x} = 58.3$ , s = 4.1
2. n = 27, $\bar{x} = 58.3$ , s = 4.1
A random sample of size 14 is drawn from a normal population. The summary statistics are $\bar{x} = 933$ and s = 18.
1. Construct an 80% confidence interval for the population mean μ.
2. Construct a 90% confidence interval for the population mean μ.
3. Comment on why one interval is longer than the other.
A random sample of size 28 is drawn from a normal population. The summary statistics are $\bar{x} = 68.6$ and s = 1.28.
1. Construct a 95% confidence interval for the population mean μ.
2. Construct a 99.5% confidence interval for the population mean μ.
3. Comment on why one interval is longer than the other.

Applications

City planners wish to estimate the mean lifetime of the most commonly planted trees in urban settings. A sample of 16 recently felled trees yielded mean age 32.7 years with standard deviation 3.1 years. Assuming the lifetimes of all such trees are normally distributed, construct a 99.8% confidence interval for the mean lifetime of all such trees.
To estimate the number of calories in a cup of diced chicken breast meat, the number of calories in a sample of four separate cups of meat is measured. The sample mean is 211.8 calories with sample standard deviation 0.9 calorie. Assuming the caloric content of all such chicken meat is normally distributed, construct a 95% confidence interval for the mean number of calories in one cup of meat.
A college athletic program wishes to estimate the average increase in the total weight an athlete can lift in three different lifts after following a particular training program for six weeks. Twenty-five randomly selected athletes when placed on the program exhibited a mean gain of 47.3 lb with standard deviation 6.4 lb. Construct a 90% confidence interval for the mean increase in lifting capacity all athletes would experience if placed on the training program. Assume increases among all athletes are normally distributed.
To test a new tread design with respect to stopping distance, a tire manufacturer manufactures a set of prototype tires and measures the stopping distance from 70 mph on a standard test car. A sample of 25 stopping distances yielded a sample mean 173 feet with sample standard deviation 8 feet. Construct a 98% confidence interval for the mean stopping distance for these tires. Assume a normal distribution of stopping distances.
A manufacturer of chokes for shotguns tests a choke by shooting 15 patterns at targets 40 yards away with a specified load of shot. The mean number of shot in a 30-inch circle is 53.5 with standard deviation 1.6. Construct an 80% confidence interval for the mean number of shot in a 30-inch circle at 40 yards for this choke with the specified load. Assume a normal distribution of the number of shot in a 30-inch circle at 40 yards for this choke.
In order to estimate the speaking vocabulary of three-year-old children in a particular socioeconomic class, a sociologist studies the speech of four children. The mean and standard deviation of the sample are $\bar{x} = 1120$ and s = 215 words. Assuming that speaking vocabularies are normally distributed, construct an 80% confidence interval for the mean speaking vocabulary of all three-year-old children in this socioeconomic group.
A thread manufacturer tests a sample of eight lengths of a certain type of thread made of blended materials and obtains a mean tensile strength of 8.2 lb with standard deviation 0.06 lb. Assuming tensile strengths are normally distributed, construct a 90% confidence interval for the mean tensile strength of this thread.
An airline wishes to estimate the weight of the paint on a fully painted aircraft of the type it flies. In a sample of four repaintings the average weight of the paint applied was 239 pounds, with sample standard deviation 8 pounds. Assuming that weights of paint on aircraft are normally distributed, construct a 99.8% confidence interval for the mean weight of paint on all such aircraft.
In a study of dummy foal syndrome, the average time between birth and onset of noticeable symptoms in a sample of six foals was 18.6 hours, with standard deviation 1.7 hours. Assuming that the time to onset of symptoms in all foals is normally distributed, construct a 90% confidence interval for the mean time between birth and onset of noticeable symptoms.
A sample of 26 women’s size 6 dresses had mean waist measurement 25.25 inches with sample standard deviation 0.375 inch. Construct a 95% confidence interval for the mean waist measurement of all size 6 women’s dresses. Assume waist measurements are normally distributed.

Additional Exercises

Botanists studying attrition among saplings in new growth areas of forests diligently counted stems in six plots in five-year-old new growth areas, obtaining the following counts of stems per acre:
$\begin{matrix} 9,432 & 11,026 & 10,539 \\ 8,773 & 9,868 & 10,247 \end{matrix}$
Construct an 80% confidence interval for the mean number of stems per acre in all five-year-old new growth areas of forests. Assume that the number of stems per acre is normally distributed.
Nutritionists are investigating the efficacy of a diet plan designed to increase the caloric intake of elderly people. The increase in daily caloric intake in 12 individuals who are put on the plan is (a minus sign signifies that calories consumed went down):
$\begin{matrix} 121 & 284 & − 94 & 295 & 183 & 312 \\ 188 & − 102 & 259 & 226 & 152 & 167 \end{matrix}$
Construct a 99.8% confidence interval for the mean increase in caloric intake for all people who are put on this diet. Assume that population of differences in intake is normally distributed.
A machine for making precision cuts in dimension lumber produces studs with lengths that vary with standard deviation 0.003 inch. Five trial cuts are made to check the machine’s calibration. The mean length of the studs produced is 104.998 inches with sample standard deviation 0.004 inch. Construct a 99.5% confidence interval for the mean lengths of all studs cut by this machine. Assume lengths are normally distributed. Hint: Not all the numbers given in the problem are used.
The variation in time for a baked good to go through a conveyor oven at a large scale bakery has standard deviation 0.017 minute at every time setting. To check the bake time of the oven periodically four batches of goods are carefully timed. The recent check gave a mean of 27.2 minutes with sample standard deviation 0.012 minute. Construct a 99.8% confidence interval for the mean bake time of all batches baked in this oven. Assume bake times are normally distributed. Hint: Not all the numbers given in the problem are used.
Wildlife researchers tranquilized and weighed three adult male polar bears. The data (in pounds) are: 926, 742, 1,109. Assume the weights of all bears are normally distributed.
1. Construct an 80% confidence interval for the mean weight of all adult male polar bears using these data.
2. Convert the three weights in pounds to weights in kilograms using the conversion 1 lb = 0.453 kg (so the first datum changes to $(926) (0.453) = 419$ ). Use the converted data to construct an 80% confidence interval for the mean weight of all adult male polar bears expressed in kilograms.
3. Convert your answer in part (a) into kilograms directly and compare it to your answer in (b). This illustrates that if you construct a confidence interval in one system of units you can convert it directly into another system of units without having to convert all the data to the new units.
Wildlife researchers trapped and measured six adult male collared lemmings. The data (in millimeters) are: 104, 99, 112, 115, 96, 109. Assume the lengths of all lemmings are normally distributed.
1. Construct a 90% confidence interval for the mean length of all adult male collared lemmings using these data.
2. Convert the six lengths in millimeters to lengths in inches using the conversion 1 mm = 0.039 in (so the first datum changes to (104)(0.039) = 4.06). Use the converted data to construct a 90% confidence interval for the mean length of all adult male collared lemmings expressed in inches.
3. Convert your answer in part (a) into inches directly and compare it to your answer in (b). This illustrates that if you construct a confidence interval in one system of units you can convert it directly into another system of units without having to convert all the data to the new units.

Answers

1. $98 \pm 3.9$
2. $98 \pm 5.2$
1. $386 \pm 16.4$
2. $386 \pm 33.6$
1. $933 \pm 6.5$
2. $933 \pm 8.5$
3. Asking for greater confidence requires a longer interval.

$32.7 \pm 2.9$
$47.3 \pm 2.19$
$53.5 \pm 0.56$
$8.2 \pm 0.04$
$18.6 \pm 1.4$

$9981 \pm 486$
$104.998 \pm 0.004$
1. $926 \pm 200$
2. $419 \pm 90$
3. $419 \pm 91$