Sample Size Considerations

9.5 Sample Size Considerations

Learning Objective

To learn how to apply formulas for estimating the size samples that will be needed in order to construct a confidence interval for the difference in two population means or proportions that meets given criteria.

As was pointed out at the beginning of Section 7.4 "Sample Size Considerations" in Chapter 7 "Estimation", sampling is typically done with definite objectives in mind. For example, a physician might wish to estimate the difference in the average amount of sleep gotten by patients suffering a certain condition with the average amount of sleep got by healthy adults, at 90% confidence and to within half an hour. Since sampling costs time, effort, and money, it would be useful to be able to estimate the smallest size samples that are likely to meet these criteria.

Estimating $μ_{1} - μ_{2}$ with Independent Samples

Assuming that large samples will be required, the confidence interval formula for estimating the difference $μ_{1} - μ_{2}$ between two population means using independent samples is $({\bar{x}}_{1} - {\bar{x}}_{2}) \pm E$ , where

E = z_{α ∕ 2} \sqrt{\frac{s_{1}^{2}}{n_{1}} + \frac{s_{2}^{2}}{n_{2}}}

To say that we wish to estimate the mean to within a certain number of units means that we want the margin of error E to be no larger than that number. The number $z_{α ∕ 2}$ is determined by the desired level of confidence.

The numbers s₁ and s₂ are estimates of the standard deviations $σ_{1}$ and $σ_{2}$ of the two populations. In analogy with what we did in Section 7.4 "Sample Size Considerations" in Chapter 7 "Estimation" we will assume that we either know or can reasonably approximate $σ_{1}$ and $σ_{2} .$

We cannot solve for both n₁ and n₂, so we have to make an assumption about their relative sizes. We will specify that they be equal. With these assumptions we obtain the minimum sample sizes needed by solving the equation displayed just above for $n_{1} = n_{2} .$

Minimum Equal Sample Sizes for Estimating the Difference in the Means of Two Populations Using Independent Samples

The estimated minimum equal sample sizes $n_{1} = n_{2}$ needed to estimate the difference $μ_{1} - μ_{2}$ in two population means to within E units at $100 (1 - α)$ % confidence is

\begin{matrix} n_{1} = n_{2} = \frac{{(z_{α ∕ 2})}^{2} (σ_{1}^{2} + σ_{2}^{2})}{E^{2}} & (rounded up) \end{matrix}

In all the examples and exercises the population standard deviations $σ_{1}$ and $σ_{2}$ will be given.

Example 13

A law firm wishes to estimate the difference in the mean delivery time of documents sent between two of its offices by two different courier companies, to within half an hour and with 99.5% confidence. From their records it will randomly sample the same number n of documents as delivered by each courier company. Determine how large n must be if the estimated standard deviations of the delivery times are 0.75 hour for one company and 1.15 hours for the other.

Solution:

Confidence level 99.5% means that $α = 1 - 0.995 = 0.005$ so $α ∕ 2 = 0.0025 .$ From the last line of Figure 12.3 "Critical Values of " we obtain $z_{0.0025} = 2.807 .$

To say that the estimate is to be “to within half an hour” means that E = 0.5. Thus

n = \frac{{(z_{α ∕ 2})}^{2} (σ_{1}^{2} + σ_{2}^{2})}{E^{2}} = \frac{{(2.807)}^{2} (0.7 5^{2} + 1.1 5^{2})}{0 . 5^{2}} = 59.40953746

which we round up to 60, since it is impossible to take a fractional observation. The law firm must sample 60 document deliveries by each company.

Estimating $μ_{1} - μ_{2}$ with Paired Samples

As we mentioned at the end of Section 9.3 "Comparison of Two Population Means: Paired Samples", if the sample is large (meaning that n ≥ 30) then in the formula for the confidence interval we may replace $t_{α ∕ 2}$ by $z_{α ∕ 2}$ , so that the confidence interval formula becomes $\bar{d} \pm E$ for

E = z_{α ∕ 2} \frac{s_{d}}{\sqrt{n}}

The number s_d is an estimate of the standard deviations $σ_{d}$ of the population of differences. We must assume that we either know or can reasonably approximate $σ_{d} .$ Thus, assuming that large samples will be required to meet the criteria given, we can solve the displayed equation for n to obtain an estimate of the number of pairs needed in the sample.

Minimum Sample Size for Estimating the Difference in the Means of Two Populations Using Paired Difference Samples

The estimated minimum number of pairs n needed to estimate the difference $μ_{d} = μ_{1} - μ_{2}$ in two population means to within E units at $100 (1 - α)$ % confidence using paired difference samples is

\begin{matrix} n = \frac{{(z_{α ∕ 2})}^{2} σ_{d}^{2}}{E^{2}} & (rounded up) \end{matrix}

In all the examples and exercises the population standard deviation of the differences $σ_{d}$ will be given.

Example 14

A automotive tire manufacturer wishes to compare the mean lifetime of two tread designs under actual driving conditions. They will mount one of each type of tire on n vehicles (both on the front or both on the back) and measure the difference in remaining tread after 20,000 miles of driving. If the standard deviation of the differences is assumed to be 0.025 inch, find the minimum samples size needed to estimate the difference in mean depth (at 20,000 miles use) to within 0.01 inch at 99.9% confidence.

Solution:

Confidence level 99.9% means that $α = 1 - 0.999 = 0.001$ so $α ∕ 2 = 0.0005 .$ From the last line of Figure 12.3 "Critical Values of " we obtain $z_{0.0005} = 3.291 .$

To say that the estimate is to be “to within 0.01 inch” means that E = 0.01. Thus

n = \frac{{(z_{α ∕ 2})}^{2} σ_{d}^{2}}{E^{2}} = \frac{{(3.291)}^{2} {(0.025)}^{2}}{{(0.01)}^{2}} = 67.69175625

which we round up to 68. The manufacturer must test 68 pairs of tires.

Estimating $p_{1} - p_{2}$

The confidence interval formula for estimating the difference $p_{1} - p_{2}$ between two population proportions is ${\hat{p}}_{1} - {\hat{p}}_{2} \pm E$ , where

E = z_{α ∕ 2} \sqrt{\frac{{\hat{p}}_{1} (1 - {\hat{p}}_{1})}{n_{1}} + \frac{{\hat{p}}_{2} (1 - {\hat{p}}_{2})}{n_{2}}}

Minimum Equal Sample Sizes for Estimating the Difference in Two Population Proportions

The estimated minimum equal sample sizes $n_{1} = n_{2}$ needed to estimate the difference $p_{1} - p_{2}$ in two population proportions to within E percentage points at $100 (1 - α)$ % confidence is

\begin{matrix} n_{1} = n_{2} = \frac{{(z_{α ∕ 2})}^{2} ({\hat{p}}_{1} (1 - {\hat{p}}_{1}) + {\hat{p}}_{2} (1 - {\hat{p}}_{2}))}{E^{2}} & (rounded up) \end{matrix}

Here we face the same dilemma that we encountered in the case of a single population proportion: the formula for estimating how large a sample to take contains the numbers ${\hat{p}}_{1}$ and ${\hat{p}}_{2}$ , which we know only after we have taken the sample. There are two ways out of this dilemma. Typically the researcher will have some idea as to the values of the population proportions p₁ and p₂, hence of what the sample proportions ${\hat{p}}_{1}$ and ${\hat{p}}_{2}$ are likely to be. If so, those estimates can be used in the formula.

The second approach to resolving the dilemma is simply to replace each of ${\hat{p}}_{1}$ and ${\hat{p}}_{2}$ in the formula by 0.5. As in the one-population case, this is the most conservative estimate, since it gives the largest possible estimate of n. If we have an estimate of only one of p₁ and p₂ we can use that estimate for it, and use the conservative estimate 0.5 for the other.

Example 15

Find the minimum equal sample sizes necessary to construct a 98% confidence interval for the difference $p_{1} - p_{2}$ with a margin of error E = 0.05,

assuming that no prior knowledge about p₁ or p₂ is available; and
assuming that prior studies suggest that $p_{1} \approx 0.2$ and $p_{2} \approx 0.3 .$

Solution:

Confidence level 98% means that $α = 1 - 0.98 = 0.02$ so $α ∕ 2 = 0.01 .$ From the last line of Figure 12.3 "Critical Values of " we obtain $z_{0.01} = 2.326 .$

Since there is no prior knowledge of p₁ or p₂ we make the most conservative estimate that ${\hat{p}}_{1} = 0.5$ and ${\hat{p}}_{2} = 0.5 .$ Then
$\begin{matrix} n_{1} = n_{2} = \frac{{(z_{α ∕ 2})}^{2} ({\hat{p}}_{1} (1 - {\hat{p}}_{1}) + {\hat{p}}_{2} (1 - {\hat{p}}_{2}))}{E^{2}} \\ = \frac{{(2.326)}^{2} ((0.5) (0.5) + (0.5) (0.5))}{0.0 5^{2}} \\ = 1082.0552 \end{matrix}$
which we round up to 1,083. We must take a sample of size 1,083 from each population.
Since $p_{1} \approx 0.2$ we estimate ${\hat{p}}_{1}$ by 0.2, and since $p_{2} \approx 0.3$ we estimate ${\hat{p}}_{2}$ by 0.3. Thus we obtain
$\begin{matrix} n_{1} = n_{2} = \frac{{(z_{α ∕ 2})}^{2} ({\hat{p}}_{1} (1 - {\hat{p}}_{1}) + {\hat{p}}_{2} (1 - {\hat{p}}_{2}))}{E^{2}} \\ = \frac{{(2.326)}^{2} ((0.2) (0.8) + (0.3) (0.7))}{0.0 5^{2}} \\ = 800.720848 \end{matrix}$
which we round up to 801. We must take a sample of size 801 from each population.

Key Takeaways

If the population standard deviations $σ_{1}$ and $σ_{2}$ are known or can be estimated, then the minimum equal sizes of independent samples needed to obtain a confidence interval for the difference $μ_{1} - μ_{2}$ in two population means with a given maximum error of the estimate E and a given level of confidence can be estimated.
If the standard deviation $σ_{d}$ of the population of differences in pairs drawn from two populations is known or can be estimated, then the minimum number of sample pairs needed under paired difference sampling to obtain a confidence interval for the difference $μ_{d} = μ_{1} - μ_{2}$ in two population means with a given maximum error of the estimate E and a given level of confidence can be estimated.
The minimum equal sample sizes needed to obtain a confidence interval for the difference in two population proportions with a given maximum error of the estimate and a given level of confidence can always be estimated. If there is prior knowledge of the population proportions p₁ and p₂ then the estimate can be sharpened.

Exercises

Basic

Estimate the common sample size n of equally sized independent samples needed to estimate $μ_{1} - μ_{2}$ as specified when the population standard deviations are as shown.
1. 90% confidence, to within 3 units, $σ_{1} = 10$ and $σ_{2} = 7$
2. 99% confidence, to within 4 units, $σ_{1} = 6.8$ and $σ_{2} = 9.3$
3. 95% confidence, to within 5 units, $σ_{1} = 22.6$ and $σ_{2} = 31.8$
Estimate the common sample size n of equally sized independent samples needed to estimate $μ_{1} - μ_{2}$ as specified when the population standard deviations are as shown.
1. 80% confidence, to within 2 units, $σ_{1} = 14$ and $σ_{2} = 23$
2. 90% confidence, to within 0.3 units, $σ_{1} = 1.3$ and $σ_{2} = 0.8$
3. 99% confidence, to within 11 units, $σ_{1} = 42$ and $σ_{2} = 37$
Estimate the number n of pairs that must be sampled in order to estimate $μ_{d} = μ_{1} - μ_{2}$ as specified when the standard deviation s_d of the population of differences is as shown.
1. 80% confidence, to within 6 units, $σ_{d} = 26.5$
2. 95% confidence, to within 4 units, $σ_{d} = 12$
3. 90% confidence, to within 5.2 units, $σ_{d} = 11.3$
Estimate the number n of pairs that must be sampled in order to estimate $μ_{d} = μ_{1} - μ_{2}$ as specified when the standard deviation s_d of the population of differences is as shown.
1. 90% confidence, to within 20 units, $σ_{d} = 75.5$
2. 95% confidence, to within 11 units, $σ_{d} = 31.4$
3. 99% confidence, to within 1.8 units, $σ_{d} = 4$
Estimate the minimum equal sample sizes $n_{1} = n_{2}$ necessary in order to estimate $p_{1} - p_{2}$ as specified.
1. 80% confidence, to within 0.05 (five percentage points)
  1. when no prior knowledge of p₁ or p₂ is available
  2. when prior studies indicate that $p_{1} \approx 0.20$ and $p_{2} \approx 0.65$
2. 90% confidence, to within 0.02 (two percentage points)
  1. when no prior knowledge of p₁ or p₂ is available
  2. when prior studies indicate that $p_{1} \approx 0.75$ and $p_{2} \approx 0.63$
3. 95% confidence, to within 0.10 (ten percentage points)
  1. when no prior knowledge of p₁ or p₂ is available
  2. when prior studies indicate that $p_{1} \approx 0.11$ and $p_{2} \approx 0.37$
Estimate the minimum equal sample sizes $n_{1} = n_{2}$ necessary in order to estimate $p_{1} - p_{2}$ as specified.
1. 80% confidence, to within 0.02 (two percentage points)
  1. when no prior knowledge of p₁ or p₂ is available
  2. when prior studies indicate that $p_{1} \approx 0.78$ and $p_{2} \approx 0.65$
2. 90% confidence, to within 0.05 (two percentage points)
  1. when no prior knowledge of p₁ or p₂ is available
  2. when prior studies indicate that $p_{1} \approx 0.12$ and $p_{2} \approx 0.24$
3. 95% confidence, to within 0.10 (ten percentage points)
  1. when no prior knowledge of p₁ or p₂ is available
  2. when prior studies indicate that $p_{1} \approx 0.14$ and $p_{2} \approx 0.21$

Applications

An educational researcher wishes to estimate the difference in average scores of elementary school children on two versions of a 100-point standardized test, at 99% confidence and to within two points. Estimate the minimum equal sample sizes necessary if it is known that the standard deviation of scores on different versions of such tests is 4.9.
A university administrator wishes to estimate the difference in mean grade point averages among all men affiliated with fraternities and all unaffiliated men, with 95% confidence and to within 0.15. It is known from prior studies that the standard deviations of grade point averages in the two groups have common value 0.4. Estimate the minimum equal sample sizes necessary to meet these criteria.
An automotive tire manufacturer wishes to estimate the difference in mean wear of tires manufactured with an experimental material and ordinary production tire, with 90% confidence and to within 0.5 mm. To eliminate extraneous factors arising from different driving conditions the tires will be tested in pairs on the same vehicles. It is known from prior studies that the standard deviations of the differences of wear of tires constructed with the two kinds of materials is 1.75 mm. Estimate the minimum number of pairs in the sample necessary to meet these criteria.
To assess to the relative happiness of men and women in their marriages, a marriage counselor plans to administer a test measuring happiness in marriage to n randomly selected married couples, record the their test scores, find the differences, and then draw inferences on the possible difference. Let $μ_{1}$ and $μ_{2}$ be the true average levels of happiness in marriage for men and women respectively as measured by this test. Suppose it is desired to find a 90% confidence interval for estimating $μ_{d} = μ_{1} - μ_{2}$ to within two test points. Suppose further that, from prior studies, it is known that the standard deviation of the differences in test scores is $σ_{d} \approx 10 .$ What is the minimum number of married couples that must be included in this study?
A journalist plans to interview an equal number of members of two political parties to compare the proportions in each party who favor a proposal to allow citizens with a proper license to carry a concealed handgun in public parks. Let p₁ and p₂ be the true proportions of members of the two parties who are in favor of the proposal. Suppose it is desired to find a 95% confidence interval for estimating $p_{1} - p_{2}$ to within 0.05. Estimate the minimum equal number of members of each party that must be sampled to meet these criteria.
A member of the state board of education wants to compare the proportions of National Board Certified (NBC) teachers in private high schools and in public high schools in the state. His study plan calls for an equal number of private school teachers and public school teachers to be included in the study. Let p₁ and p₂ be these proportions. Suppose it is desired to find a 99% confidence interval that estimates $p_{1} - p_{2}$ to within 0.05.
1. Supposing that both proportions are known, from a prior study, to be approximately 0.15, compute the minimum common sample size needed.
2. Compute the minimum common sample size needed on the supposition that nothing is known about the values of p₁ and p₂.

Answers

1. $n_{1} = n_{2} = 45$ ,
2. $n_{1} = n_{2} = 56 .$
3. $n_{1} = n_{2} = 234$
1. $n_{1} = n_{2} = 33 .$
2. $n_{1} = n_{2} = 35 .$
3. $n_{1} = n_{2} = 13$
1. 1. $n_{1} = n_{2} = 329$ ,
  2. $n_{1} = n_{2} = 255 .$
2. 1. $n_{1} = n_{2} = 3383$ ,
  2. $n_{1} = n_{2} = 2846 .$
3. 1. $n_{1} = n_{2} = 193$ ,
  2. $n_{1} = n_{2} = 128$

$n_{1} = n_{2} \approx 80$
$n_{1} = n_{2} \approx 34$
$n_{1} = n_{2} \approx 769$

9.5 Sample Size Considerations

Learning Objective

Estimating μ1−μ2 with Independent Samples

Minimum Equal Sample Sizes for Estimating the Difference in the Means of Two Populations Using Independent Samples

Example 13

Estimating μ1−μ2 with Paired Samples

Minimum Sample Size for Estimating the Difference in the Means of Two Populations Using Paired Difference Samples

Example 14

Estimating p1−p2

Minimum Equal Sample Sizes for Estimating the Difference in Two Population Proportions

Example 15

Key Takeaways

Exercises

Basic

Applications

Answers

Estimating $μ_{1} - μ_{2}$ with Independent Samples

Estimating $μ_{1} - μ_{2}$ with Paired Samples

Estimating $p_{1} - p_{2}$