The Sample Proportion

6.3 The Sample Proportion

Learning Objectives

To recognize that the sample proportion $\hat{P}$ is a random variable.
To understand the meaning of the formulas for the mean and standard deviation of the sample proportion.
To learn what the sampling distribution of $\hat{P}$ is when the sample size is large.

Often sampling is done in order to estimate the proportion of a population that has a specific characteristic, such as the proportion of all items coming off an assembly line that are defective or the proportion of all people entering a retail store who make a purchase before leaving. The population proportion is denoted p and the sample proportion is denoted $\hat{p} .$ Thus if in reality 43% of people entering a store make a purchase before leaving, p = 0.43; if in a sample of 200 people entering the store, 78 make a purchase, $\hat{p} = 78 / 200 = 0.39 .$

The sample proportion is a random variable: it varies from sample to sample in a way that cannot be predicted with certainty. Viewed as a random variable it will be written $\hat{P} .$ It has a meanThe number about which proportions computed from samples of the same size center. $μ_{\hat{P}}$ and a standard deviationA measure of the variability of proportions computed from samples of the same size. $σ_{\hat{P}} .$ Here are formulas for their values.

Suppose random samples of size n are drawn from a population in which the proportion with a characteristic of interest is p. The mean $μ_{\hat{P}}$ and standard deviation $σ_{\hat{P}}$ of the sample proportion $\hat{P}$ satisfy

μ_{\hat{P}} = p and σ_{\hat{P}} = \sqrt{\frac{p q}{n}}

where $q = 1 - p .$

The Central Limit Theorem has an analogue for the population proportion $\hat{P} .$ To see how, imagine that every element of the population that has the characteristic of interest is labeled with a 1, and that every element that does not is labeled with a 0. This gives a numerical population consisting entirely of zeros and ones. Clearly the proportion of the population with the special characteristic is the proportion of the numerical population that are ones; in symbols,

p = \frac{number of 1 s}{N}

But of course the sum of all the zeros and ones is simply the number of ones, so the mean μ of the numerical population is

μ = \frac{Σ x}{N} = \frac{number of 1 s}{N}

Thus the population proportion p is the same as the mean μ of the corresponding population of zeros and ones. In the same way the sample proportion $\hat{p}$ is the same as the sample mean $\bar{x} .$ Thus the Central Limit Theorem applies to $\hat{P} .$ However, the condition that the sample be large is a little more complicated than just being of size at least 30.

The Sampling Distribution of the Sample Proportion

For large samples, the sample proportion is approximately normally distributed, with mean $μ_{\hat{P}} = p$ and standard deviation $σ_{\hat{P}} = \sqrt{p q / n} .$

A sample is large if the interval $[p − 3 σ_{\hat{P}}, p + 3 σ_{\hat{P}}]$ lies wholly within the interval $[0,1] .$

In actual practice p is not known, hence neither is $σ_{\hat{P}} .$ In that case in order to check that the sample is sufficiently large we substitute the known quantity $\hat{p}$ for p. This means checking that the interval

[\hat{p} − 3 \sqrt{\frac{\hat{p} (1 - \hat{p})}{n}}, \hat{p} + 3 \sqrt{\frac{\hat{p} (1 - \hat{p})}{n}}]

lies wholly within the interval $[0,1] .$ This is illustrated in the examples.

Figure 6.5 "Distribution of Sample Proportions" shows that when p = 0.1 a sample of size 15 is too small but a sample of size 100 is acceptable. Figure 6.6 "Distribution of Sample Proportions for " shows that when p = 0.5 a sample of size 15 is acceptable.

Figure 6.5 Distribution of Sample Proportions

Figure 6.6 Distribution of Sample Proportions for p = 0.5 and n = 15

Example 7

Suppose that in a population of voters in a certain region 38% are in favor of particular bond issue. Nine hundred randomly selected voters are asked if they favor the bond issue.

Verify that the sample proportion $\hat{P}$ computed from samples of size 900 meets the condition that its sampling distribution be approximately normal.
Find the probability that the sample proportion computed from a sample of size 900 will be within 5 percentage points of the true population proportion.

Solution

The information given is that p = 0.38, hence $q = 1 - p = 0.62 .$ First we use the formulas to compute the mean and standard deviation of $\hat{P}$ :
$μ_{\hat{P}} = p = 0.38 and σ_{\hat{P}} = \sqrt{\frac{p q}{n}} = \sqrt{\frac{(0.38) (0.62)}{900}} = 0.01618$
Then $3 σ_{\hat{P}} = 3 (0.01618) = 0.04854 \approx 0.05$ so
$[p − 3 σ_{\hat{P}}, p + 3 σ_{\hat{P}}] = [0.38 - 0 . 05,0 . 38 + 0.05] = [0 . 33,0 . 43]$
which lies wholly within the interval $[0,1]$ , so it is safe to assume that $\hat{P}$ is approximately normally distributed.
To be within 5 percentage points of the true population proportion 0.38 means to be between $0.38 - 0.05 = 0.33$ and $0.38 + 0.05 = 0.43 .$ Thus
$\begin{array}{l} P (0.33 < \hat{P} < 0.43) & = P (\frac{0.33 - μ_{\hat{P}}}{σ_{\hat{P}}} < Z < \frac{0.43 - μ_{\hat{P}}}{σ_{\hat{P}}}) \\ = P (\frac{0.33 - 0.38}{0.01618} < Z < \frac{0.43 - 0.38}{0.01618}) \\ = P (− 3.09 < Z < 3.09) \\ = P (3.09) - P (− 3.09) \\ = 0.9990 - 0.0010 = 0.9980 \end{array}$

Example 8

An online retailer claims that 90% of all orders are shipped within 12 hours of being received. A consumer group placed 121 orders of different sizes and at different times of day; 102 orders were shipped within 12 hours.

Compute the sample proportion of items shipped within 12 hours.
Confirm that the sample is large enough to assume that the sample proportion is normally distributed. Use p = 0.90, corresponding to the assumption that the retailer’s claim is valid.
Assuming the retailer’s claim is true, find the probability that a sample of size 121 would produce a sample proportion so low as was observed in this sample.
Based on the answer to part (c), draw a conclusion about the retailer’s claim.

Solution

The sample proportion is the number x of orders that are shipped within 12 hours divided by the number n of orders in the sample:
$\hat{p} = \frac{x}{n} = \frac{102}{121} = 0.84$
Since p = 0.90, $q = 1 - p = 0.10$ , and n = 121,
$σ_{\hat{P}} = \sqrt{\frac{(0.90) (0.10)}{121}} = 0.0 \bar{27}$
hence
$[p − 3 σ_{\hat{P}}, p + 3 σ_{\hat{P}}] = [0.90 - 0 . 08,0 . 90 + 0.08] = [0 . 82,0 . 98]$
Because $[0.82, 0.98] \subset [0,1],$ it is appropriate to use the normal distribution to compute probabilities related to the sample proportion $\hat{P} .$
Using the value of $\hat{P}$ from part (a) and the computation in part (b),
$\begin{array}{l} P (\hat{P} \leq 0.84) & = P (Z \leq \frac{0.84 - μ_{\hat{P}}}{σ_{\hat{P}}}) \\ = P (Z \leq \frac{0.84 - 0.90}{0.0 \bar{27}}) \\ = P (Z \leq − 2.20) = 0.0139 \end{array}$
The computation shows that a random sample of size 121 has only about a 1.4% chance of producing a sample proportion as the one that was observed, $\hat{p} = 0.84$ , when taken from a population in which the actual proportion is 0.90. This is so unlikely that it is reasonable to conclude that the actual value of p is less than the 90% claimed.

Key Takeaways

The sample proportion is a random variable $\hat{P} .$
There are formulas for the mean $μ_{\hat{P}}$ and standard deviation $σ_{\hat{P}}$ of the sample proportion.
When the sample size is large the sample proportion is normally distributed.

Exercises

Basic

The proportion of a population with a characteristic of interest is p = 0.37. Find the mean and standard deviation of the sample proportion $\hat{P}$ obtained from random samples of size 1,600.
The proportion of a population with a characteristic of interest is p = 0.82. Find the mean and standard deviation of the sample proportion $\hat{P}$ obtained from random samples of size 900.
The proportion of a population with a characteristic of interest is p = 0.76. Find the mean and standard deviation of the sample proportion $\hat{P}$ obtained from random samples of size 1,200.
The proportion of a population with a characteristic of interest is p = 0.37. Find the mean and standard deviation of the sample proportion $\hat{P}$ obtained from random samples of size 125.
Random samples of size 225 are drawn from a population in which the proportion with the characteristic of interest is 0.25. Decide whether or not the sample size is large enough to assume that the sample proportion $\hat{P}$ is normally distributed.
Random samples of size 1,600 are drawn from a population in which the proportion with the characteristic of interest is 0.05. Decide whether or not the sample size is large enough to assume that the sample proportion $\hat{P}$ is normally distributed.
Random samples of size n produced sample proportions $\hat{p}$ as shown. In each case decide whether or not the sample size is large enough to assume that the sample proportion $\hat{P}$ is normally distributed.
1. n = 50, $\hat{p} = 0.48$
2. n = 50, $\hat{p} = 0.12$
3. n = 100, $\hat{p} = 0.12$
Samples of size n produced sample proportions $\hat{p}$ as shown. In each case decide whether or not the sample size is large enough to assume that the sample proportion $\hat{P}$ is normally distributed.
1. n = 30, $\hat{p} = 0.72$
2. n = 30, $\hat{p} = 0.84$
3. n = 75, $\hat{p} = 0.84$
A random sample of size 121 is taken from a population in which the proportion with the characteristic of interest is p = 0.47. Find the indicated probabilities.
1. $P (0.45 \leq \hat{P} \leq 0.50)$
2. $P (\hat{P} \geq 0.50)$
A random sample of size 225 is taken from a population in which the proportion with the characteristic of interest is p = 0.34. Find the indicated probabilities.
1. $P (0.25 \leq \hat{P} \leq 0.40)$
2. $P (\hat{P} \leq 0.35)$
A random sample of size 900 is taken from a population in which the proportion with the characteristic of interest is p = 0.62. Find the indicated probabilities.
1. $P (0.60 \leq \hat{P} \leq 0.64)$
2. $P (0.57 \leq \hat{P} \leq 0.67)$
A random sample of size 1,100 is taken from a population in which the proportion with the characteristic of interest is p = 0.28. Find the indicated probabilities.
1. $P (0.27 \leq \hat{P} \leq 0.29)$
2. $P (0.23 \leq \hat{P} \leq 0.33)$

Applications

Suppose that 8% of all males suffer some form of color blindness. Find the probability that in a random sample of 250 men at least 10% will suffer some form of color blindness. First verify that the sample is sufficiently large to use the normal distribution.
Suppose that 29% of all residents of a community favor annexation by a nearby municipality. Find the probability that in a random sample of 50 residents at least 35% will favor annexation. First verify that the sample is sufficiently large to use the normal distribution.
Suppose that 2% of all cell phone connections by a certain provider are dropped. Find the probability that in a random sample of 1,500 calls at most 40 will be dropped. First verify that the sample is sufficiently large to use the normal distribution.
Suppose that in 20% of all traffic accidents involving an injury, driver distraction in some form (for example, changing a radio station or texting) is a factor. Find the probability that in a random sample of 275 such accidents between 15% and 25% involve driver distraction in some form. First verify that the sample is sufficiently large to use the normal distribution.
An airline claims that 72% of all its flights to a certain region arrive on time. In a random sample of 30 recent arrivals, 19 were on time. You may assume that the normal distribution applies.
1. Compute the sample proportion.
2. Assuming the airline’s claim is true, find the probability of a sample of size 30 producing a sample proportion so low as was observed in this sample.
A humane society reports that 19% of all pet dogs were adopted from an animal shelter. Assuming the truth of this assertion, find the probability that in a random sample of 80 pet dogs, between 15% and 20% were adopted from a shelter. You may assume that the normal distribution applies.
In one study it was found that 86% of all homes have a functional smoke detector. Suppose this proportion is valid for all homes. Find the probability that in a random sample of 600 homes, between 80% and 90% will have a functional smoke detector. You may assume that the normal distribution applies.
A state insurance commission estimates that 13% of all motorists in its state are uninsured. Suppose this proportion is valid. Find the probability that in a random sample of 50 motorists, at least 5 will be uninsured. You may assume that the normal distribution applies.
An outside financial auditor has observed that about 4% of all documents he examines contain an error of some sort. Assuming this proportion to be accurate, find the probability that a random sample of 700 documents will contain at least 30 with some sort of error. You may assume that the normal distribution applies.
Suppose 7% of all households have no home telephone but depend completely on cell phones. Find the probability that in a random sample of 450 households, between 25 and 35 will have no home telephone. You may assume that the normal distribution applies.

Additional Exercises

Some countries allow individual packages of prepackaged goods to weigh less than what is stated on the package, subject to certain conditions, such as the average of all packages being the stated weight or greater. Suppose that one requirement is that at most 4% of all packages marked 500 grams can weigh less than 490 grams. Assuming that a product actually meets this requirement, find the probability that in a random sample of 150 such packages the proportion weighing less than 490 grams is at least 3%. You may assume that the normal distribution applies.
An economist wishes to investigate whether people are keeping cars longer now than in the past. He knows that five years ago, 38% of all passenger vehicles in operation were at least ten years old. He commissions a study in which 325 automobiles are randomly sampled. Of them, 132 are ten years old or older.
1. Find the sample proportion.
2. Find the probability that, when a sample of size 325 is drawn from a population in which the true proportion is 0.38, the sample proportion will be as large as the value you computed in part (a). You may assume that the normal distribution applies.
3. Give an interpretation of the result in part (b). Is there strong evidence that people are keeping their cars longer than was the case five years ago?
A state public health department wishes to investigate the effectiveness of a campaign against smoking. Historically 22% of all adults in the state regularly smoked cigars or cigarettes. In a survey commissioned by the public health department, 279 of 1,500 randomly selected adults stated that they smoke regularly.
1. Find the sample proportion.
2. Find the probability that, when a sample of size 1,500 is drawn from a population in which the true proportion is 0.22, the sample proportion will be no larger than the value you computed in part (a). You may assume that the normal distribution applies.
3. Give an interpretation of the result in part (b). How strong is the evidence that the campaign to reduce smoking has been effective?
In an effort to reduce the population of unwanted cats and dogs, a group of veterinarians set up a low-cost spay/neuter clinic. At the inception of the clinic a survey of pet owners indicated that 78% of all pet dogs and cats in the community were spayed or neutered. After the low-cost clinic had been in operation for three years, that figure had risen to 86%.
1. What information is missing that you would need to compute the probability that a sample drawn from a population in which the proportion is 78% (corresponding to the assumption that the low-cost clinic had had no effect) is as high as 86%?
2. Knowing that the size of the original sample three years ago was 150 and that the size of the recent sample was 125, compute the probability mentioned in part (a). You may assume that the normal distribution applies.
3. Give an interpretation of the result in part (b). How strong is the evidence that the presence of the low-cost clinic has increased the proportion of pet dogs and cats that have been spayed or neutered?
An ordinary die is “fair” or “balanced” if each face has an equal chance of landing on top when the die is rolled. Thus the proportion of times a three is observed in a large number of tosses is expected to be close to 1/6 or $0.1 \bar{6} .$ Suppose a die is rolled 240 times and shows three on top 36 times, for a sample proportion of 0.15.
1. Find the probability that a fair die would produce a proportion of 0.15 or less. You may assume that the normal distribution applies.
2. Give an interpretation of the result in part (b). How strong is the evidence that the die is not fair?
3. Suppose the sample proportion 0.15 came from rolling the die 2,400 times instead of only 240 times. Rework part (a) under these circumstances.
4. Give an interpretation of the result in part (c). How strong is the evidence that the die is not fair?

Answers

$μ_{\hat{P}} = 0.37$ , $σ_{\hat{P}} = 0.012$
$μ_{\hat{P}} = 0.76$ , $σ_{\hat{P}} = 0.012$
$p \pm 3 \sqrt{\frac{p q}{n}} = 0.25 \pm 0.087$ , yes
1. $\hat{p} \pm 3 \sqrt{\frac{\hat{p} \hat{q}}{n}} = 0.48 \pm 0.21$ , yes
2. $\hat{p} \pm 3 \sqrt{\frac{\hat{p} \hat{q}}{n}} = 0.12 \pm 0.14$ , no
3. $\hat{p} \pm 3 \sqrt{\frac{\hat{p} \hat{q}}{n}} = 0.12 \pm 0.10$ , yes
1. 0.4154
2. 0.2546
1. 0.7850
2. 0.9980

$p \pm 3 \sqrt{\frac{p q}{n}} = 0.08 \pm 0.05$

and

$[0 . 03,0 . 13] \subset [0,1], 0.1210$
$p \pm 3 \sqrt{\frac{p q}{n}} = 0.02 \pm 0.01$

and

$[0 . 01,0 . 03] \subset [0,1], 0.9671$
1. 0.63
2. 0.1446
0.9977
0.3483

0.7357
1. 0.186
2. 0.0007
3. In a population in which the true proportion is 22% the chance that a random sample of size 1500 would produce a sample proportion of 18.6% or less is only 7/100 of 1%. This is strong evidence that currently a smaller proportion than 22% smoke.
1. 0.2451
2. We would expect a sample proportion of 0.15 or less in about 24.5% of all samples of size 240, so this is practically no evidence at all that the die is not fair.
3. 0.0139
4. We would expect a sample proportion of 0.15 or less in only about 1.4% of all samples of size 2400, so this is strong evidence that the die is not fair.