Chapter 9 Two-Sample Problems

The previous two chapters treated the questions of estimating and making inferences about a parameter of a single population. In this chapter we consider a comparison of parameters that belong to two different populations. For example, we might wish to compare the average income of all adults in one region of the country with the average income of those in another region, or we might wish to compare the proportion of all men who are vegetarians with the proportion of all women who are vegetarians.

We will study construction of confidence intervals and tests of hypotheses in four situations, depending on the parameter of interest, the sizes of the samples drawn from each of the populations, and the method of sampling. We also examine sample size considerations.

9.1 Comparison of Two Population Means: Large, Independent Samples

Learning Objectives

To understand the logical framework for estimating the difference between the means of two distinct populations and performing tests of hypotheses concerning those means.
To learn how to construct a confidence interval for the difference in the means of two distinct populations using large, independent samples.
To learn how to perform a test of hypotheses concerning the difference between the means of two distinct populations using large, independent samples.

Suppose we wish to compare the means of two distinct populations. Figure 9.1 "Independent Sampling from Two Populations" illustrates the conceptual framework of our investigation in this and the next section. Each population has a mean and a standard deviation. We arbitrarily label one population as Population 1 and the other as Population 2, and subscript the parameters with the numbers 1 and 2 to tell them apart. We draw a random sample from Population 1 and label the sample statistics it yields with the subscript 1. Without reference to the first sample we draw a sample from Population 2 and label its sample statistics with the subscript 2.

Figure 9.1 Independent Sampling from Two Populations

Definition

Samples from two distinct populations are independent if each one is drawn without reference to the other, and has no connection with the other.

Our goal is to use the information in the samples to estimate the difference $μ_{1} - μ_{2}$ in the means of the two populations and to make statistically valid inferences about it.

Confidence Intervals

Since the mean ${\bar{x}}_{1}$ of the sample drawn from Population 1 is a good estimator of $μ_{1}$ and the mean ${\bar{x}}_{2}$ of the sample drawn from Population 2 is a good estimator of $μ_{2}$ , a reasonable point estimate of the difference $μ_{1} - μ_{2}$ is ${\bar{x}}_{1} - {\bar{x}}_{2} .$ In order to widen this point estimate into a confidence interval, we first suppose that both samples are large, that is, that both $n_{1} \geq 30$ and $n_{2} \geq 30 .$ If so, then the following formula for a confidence interval for $μ_{1} - μ_{2}$ is valid. The symbols $s_{1}^{2}$ and $s_{2}^{2}$ denote the squares of s₁ and s₂. (In the relatively rare case that both population standard deviations $σ_{1}$ and $σ_{2}$ are known they would be used instead of the sample standard deviations.)

$100 (1 - α) %$ Confidence Interval for the Difference Between Two Population Means: Large, Independent Samples

({\bar{x}}_{1} - {\bar{x}}_{2}) \pm z_{α ∕ 2} \sqrt{\frac{s_{1}^{2}}{n_{1}} + \frac{s_{2}^{2}}{n_{2}}}

The samples must be independent, and each sample must be large: $n_{1} \geq 30$ and $n_{2} \geq 30 .$

Example 1

To compare customer satisfaction levels of two competing cable television companies, 174 customers of Company 1 and 355 customers of Company 2 were randomly selected and were asked to rate their cable companies on a five-point scale, with 1 being least satisfied and 5 most satisfied. The survey results are summarized in the following table:

Company 1	Company 2
$n_{1} = 174$	$n_{2} = 355$
${\bar{x}}_{1} = 3.51$	${\bar{x}}_{2} = 3.24$
$s_{1} = 0.51$	$s_{2} = 0.52$

Construct a point estimate and a 99% confidence interval for $μ_{1} - μ_{2}$ , the difference in average satisfaction levels of customers of the two companies as measured on this five-point scale.

Solution:

The point estimate of $μ_{1} - μ_{2}$ is

{\bar{x}}_{1} - {\bar{x}}_{2} = 3.51 - 3.24 = 0.27 .

In words, we estimate that the average customer satisfaction level for Company 1 is 0.27 points higher on this five-point scale than it is for Company 2.

To apply the formula for the confidence interval, proceed exactly as was done in Chapter 7 "Estimation". The 99% confidence level means that $α = 1 - 0.99 = 0.01$ so that $z_{α ∕ 2} = z_{0.005} .$ From Figure 12.3 "Critical Values of " we read directly that $z_{0.005} = 2.576 .$ Thus

({\bar{x}}_{1} - {\bar{x}}_{2}) \pm z_{α ∕ 2} \sqrt{\frac{s_{1}^{2}}{n_{1}} + \frac{s_{2}^{2}}{n_{2}}} = 0.27 \pm 2.576 \sqrt{\frac{0.5 1^{2}}{174} + \frac{0.5 2^{2}}{355}} = 0.27 \pm 0.12

We are 99% confident that the difference in the population means lies in the interval $[0 . 15,0 . 39]$ , in the sense that in repeated sampling 99% of all intervals constructed from the sample data in this manner will contain $μ_{1} - μ_{2} .$ In the context of the problem we say we are 99% confident that the average level of customer satisfaction for Company 1 is between 0.15 and 0.39 points higher, on this five-point scale, than that for Company 2.

Hypothesis Testing

Hypotheses concerning the relative sizes of the means of two populations are tested using the same critical value and p-value procedures that were used in the case of a single population. All that is needed is to know how to express the null and alternative hypotheses and to know the formula for the standardized test statistic and the distribution that it follows.

The null and alternative hypotheses will always be expressed in terms of the difference of the two population means. Thus the null hypothesis will always be written

H_{0} : μ_{1} - μ_{2} = D_{0}

where D₀ is a number that is deduced from the statement of the situation. As was the case with a single population the alternative hypothesis can take one of the three forms, with the same terminology:

Form of $H_{a}$	Terminology
$H_{a} : μ_{1} - μ_{2} < D_{0}$	Left-tailed
$H_{a} : μ_{1} - μ_{2} > D_{0}$	Right-tailed
$H_{a} : μ_{1} - μ_{2} \neq D_{0}$	Two-tailed

As long as the samples are independent and both are large the following formula for the standardized test statistic is valid, and it has the standard normal distribution. (In the relatively rare case that both population standard deviations $σ_{1}$ and $σ_{2}$ are known they would be used instead of the sample standard deviations.)

Standardized Test Statistic for Hypothesis Tests Concerning the Difference Between Two Population Means: Large, Independent Samples

Z = \frac{({\bar{x}}_{1} - {\bar{x}}_{2}) - D_{0}}{\sqrt{\frac{s_{1}^{2}}{n_{1}} + \frac{s_{2}^{2}}{n_{2}}}}

The test statistic has the standard normal distribution.

The samples must be independent, and each sample must be large: $n_{1} \geq 30$ and $n_{2} \geq 30 .$

Example 2

Refer to Note 9.4 "Example 1" concerning the mean satisfaction levels of customers of two competing cable television companies. Test at the 1% level of significance whether the data provide sufficient evidence to conclude that Company 1 has a higher mean satisfaction rating than does Company 2. Use the critical value approach.

Solution:

Step 1. If the mean satisfaction levels $μ_{1}$ and $μ_{2}$ are the same then $μ_{1} = μ_{2}$ , but we always express the null hypothesis in terms of the difference between $μ_{1}$ and $μ_{2}$ , hence H₀ is $μ_{1} - μ_{2} = 0 .$ To say that the mean customer satisfaction for Company 1 is higher than that for Company 2 means that $μ_{1} > μ_{2}$ , which in terms of their difference is $μ_{1} - μ_{2} > 0 .$ The test is therefore
$\begin{matrix} H_{0} : μ_{1} - μ_{2} & = & 0 \\ vs. H_{a} : μ_{1} - μ_{2} & > & 0 & @ α = 0.01 \end{matrix}$
Step 2. Since the samples are independent and both are large the test statistic is
$Z = \frac{({\bar{x}}_{1} - {\bar{x}}_{2}) - D_{0}}{\sqrt{\frac{s_{1}^{2}}{n_{1}} + \frac{s_{2}^{2}}{n_{2}}}}$
Step 3. Inserting the data into the formula for the test statistic gives
$Z = \frac{({\bar{x}}_{1} - {\bar{x}}_{2}) - D_{0}}{\sqrt{\frac{s_{1}^{2}}{n_{1}} + \frac{s_{2}^{2}}{n_{2}}}} = \frac{(3.51 - 3.24) − 0}{\sqrt{\frac{0.5 1^{2}}{174} + \frac{0.5 2^{2}}{355}}} = 5.684$
Step 4. Since the symbol in H_a is “>” this is a right-tailed test, so there is a single critical value, $z_{α} = z_{0.01}$ , which from the last line in Figure 12.3 "Critical Values of " we read off as 2.326. The rejection region is $[2.326, \infty) .$

Figure 9.2 Rejection Region and Test Statistic for Note 9.6 "Example 2"
Step 5. As shown in Figure 9.2 "Rejection Region and Test Statistic for " the test statistic falls in the rejection region. The decision is to reject H₀. In the context of the problem our conclusion is:

The data provide sufficient evidence, at the 1% level of significance, to conclude that the mean customer satisfaction for Company 1 is higher than that for Company 2.

Example 3

Perform the test of Note 9.6 "Example 2" using the p-value approach.

Solution:

The first three steps are identical to those in Note 9.6 "Example 2".

Step 4. The observed significance or p-value of the test is the area of the right tail of the standard normal distribution that is cut off by the test statistic Z = 5.684. The number 5.684 is too large to appear in Figure 12.2 "Cumulative Normal Probability", which means that the area of the left tail that it cuts off is 1.0000 to four decimal places. The area that we seek, the area of the right tail, is therefore $1 - 1.0000 = 0.0000$ to four decimal places. See Figure 9.3. That is, $p -value = 0.0000$ to four decimal places. (The actual value is approximately $0.000 000 007 .$ )

Figure 9.3 P-Value for Note 9.7 "Example 3"

Step 5. Since 0.0000 < 0.01, $p -value < α$ so the decision is to reject the null hypothesis:

The data provide sufficient evidence, at the 1% level of significance, to conclude that the mean customer satisfaction for Company 1 is higher than that for Company 2.

Key Takeaways

A point estimate for the difference in two population means is simply the difference in the corresponding sample means.
In the context of estimating or testing hypotheses concerning two population means, “large” samples means that both samples are large.
A confidence interval for the difference in two population means is computed using a formula in the same fashion as was done for a single population mean.
The same five-step procedure used to test hypotheses concerning a single population mean is used to test hypotheses concerning the difference between two population means. The only difference is in the formula for the standardized test statistic.

Exercises

Basic

Construct the confidence interval for $μ_{1} - μ_{2}$ for the level of confidence and the data from independent samples given.
1. 90% confidence,
  
  $n_{1} = 45$ , ${\bar{x}}_{1} = 27$ , $s_{1} = 2$
  
  $n_{2} = 60$ , ${\bar{x}}_{2} = 22$ , $s_{2} = 3$
2. 99% confidence,
  
  $n_{1} = 30$ , ${\bar{x}}_{1} = − 112$ , $s_{1} = 9$
  
  $n_{2} = 40$ , ${\bar{x}}_{2} = − 98$ , $s_{2} = 4$
Construct the confidence interval for $μ_{1} - μ_{2}$ for the level of confidence and the data from independent samples given.
1. 95% confidence,
  
  $n_{1} = 110$ , ${\bar{x}}_{1} = 77$ , $s_{1} = 15$
  
  $n_{2} = 85$ , ${\bar{x}}_{2} = 79$ , $s_{2} = 21$
2. 90% confidence,
  
  $n_{1} = 65$ , ${\bar{x}}_{1} = − 83$ , $s_{1} = 12$
  
  $n_{2} = 65$ , ${\bar{x}}_{2} = − 74$ , $s_{2} = 8$
Construct the confidence interval for $μ_{1} - μ_{2}$ for the level of confidence and the data from independent samples given.
1. 99.5% confidence,
  
  $n_{1} = 130$ , ${\bar{x}}_{1} = 27.2$ , $s_{1} = 2.5$
  
  $n_{2} = 155$ , ${\bar{x}}_{2} = 38.8$ , $s_{2} = 4.6$
2. 95% confidence,
  
  $n_{1} = 68$ , ${\bar{x}}_{1} = 215.5$ , $s_{1} = 12.3$
  
  $n_{2} = 84$ , ${\bar{x}}_{2} = 287.8$ , $s_{2} = 14.1$
Construct the confidence interval for $μ_{1} - μ_{2}$ for the level of confidence and the data from independent samples given.
1. 99.9% confidence,
  
  $n_{1} = 275$ , ${\bar{x}}_{1} = 70.2$ , $s_{1} = 1.5$
  
  $n_{2} = 325$ , ${\bar{x}}_{2} = 63.4$ , $s_{2} = 1.1$
2. 90% confidence,
  
  $n_{1} = 120$ , ${\bar{x}}_{1} = 35.5$ , $s_{1} = 0.75$
  
  $n_{2} = 146$ , ${\bar{x}}_{2} = 29.6$ , $s_{2} = 0.80$
Perform the test of hypotheses indicated, using the data from independent samples given. Use the critical value approach. Compute the p-value of the test as well.
1. Test $H_{0} : μ_{1} - μ_{2} = 3$ vs. $H_{a} : μ_{1} - μ_{2} \neq 3$ @ $α = 0.05$ ,
  
  $n_{1} = 35$ , ${\bar{x}}_{1} = 25$ , $s_{1} = 1$
  
  $n_{2} = 45$ , ${\bar{x}}_{2} = 19$ , $s_{2} = 2$
2. Test $H_{0} : μ_{1} - μ_{2} = − 25$ vs. $H_{a} : μ_{1} - μ_{2} < − 25$ @ $α = 0.10$ ,
  
  $n_{1} = 85$ , ${\bar{x}}_{1} = 188$ , $s_{1} = 15$
  
  $n_{2} = 62$ , ${\bar{x}}_{2} = 215$ , $s_{2} = 19$
Perform the test of hypotheses indicated, using the data from independent samples given. Use the critical value approach. Compute the p-value of the test as well.
1. Test $H_{0} : μ_{1} - μ_{2} = 45$ vs. $H_{a} : μ_{1} - μ_{2} > 45$ @ $α = 0.001$ ,
  
  $n_{1} = 200$ , ${\bar{x}}_{1} = 1312$ , $s_{1} = 35$
  
  $n_{2} = 225$ , ${\bar{x}}_{2} = 1256$ , $s_{2} = 28$
2. Test $H_{0} : μ_{1} - μ_{2} = − 12$ vs. $H_{a} : μ_{1} - μ_{2} \neq − 12$ @ $α = 0.10$ ,
  
  $n_{1} = 35$ , ${\bar{x}}_{1} = 121$ , $s_{1} = 6$
  
  $n_{2} = 40$ , ${\bar{x}}_{2} = 135$ , $s_{2} = 7$
Perform the test of hypotheses indicated, using the data from independent samples given. Use the critical value approach. Compute the p-value of the test as well.
1. Test $H_{0} : μ_{1} - μ_{2} = 0$ vs. $H_{a} : μ_{1} - μ_{2} \neq 0$ @ $α = 0.01$ ,
  
  $n_{1} = 125$ , ${\bar{x}}_{1} = − 46$ , $s_{1} = 10$
  
  $n_{2} = 90$ , ${\bar{x}}_{2} = − 50$ , $s_{2} = 13$
2. Test $H_{0} : μ_{1} - μ_{2} = 20$ vs. $H_{a} : μ_{1} - μ_{2} > 20$ @ $α = 0.05$ ,
  
  $n_{1} = 40$ , ${\bar{x}}_{1} = 142$ , $s_{1} = 11$
  
  $n_{2} = 40$ , ${\bar{x}}_{2} = 118$ , $s_{2} = 10$
Perform the test of hypotheses indicated, using the data from independent samples given. Use the critical value approach. Compute the p-value of the test as well.
1. Test $H_{0} : μ_{1} - μ_{2} = 13$ vs. $H_{a} : μ_{1} - μ_{2} < 13$ @ $α = 0.01$ ,
  
  $n_{1} = 35$ , ${\bar{x}}_{1} = 100$ , $s_{1} = 2$
  
  $n_{2} = 35$ , ${\bar{x}}_{2} = 88$ , $s_{2} = 2$
2. Test $H_{0} : μ_{1} - μ_{2} = − 10$ vs. $H_{a} : μ_{1} - μ_{2} \neq − 10$ @ $α = 0.10$ ,
  
  $n_{1} = 146$ , ${\bar{x}}_{1} = 62$ , $s_{1} = 4$
  
  $n_{2} = 120$ , ${\bar{x}}_{2} = 73$ , $s_{2} = 7$
Perform the test of hypotheses indicated, using the data from independent samples given. Use the p-value approach.
1. Test $H_{0} : μ_{1} - μ_{2} = 57$ vs. $H_{a} : μ_{1} - μ_{2} < 57$ @ $α = 0.10$ ,
  
  $n_{1} = 117$ , ${\bar{x}}_{1} = 1309$ , $s_{1} = 42$
  
  $n_{2} = 133$ , ${\bar{x}}_{2} = 1258$ , $s_{2} = 37$
2. Test $H_{0} : μ_{1} - μ_{2} = − 1.5$ vs. $H_{a} : μ_{1} - μ_{2} \neq − 1.5$ @ $α = 0.20$ ,
  
  $n_{1} = 65$ , ${\bar{x}}_{1} = 16.9$ , $s_{1} = 1.3$
  
  $n_{2} = 57$ , ${\bar{x}}_{2} = 18.6$ , $s_{2} = 1.1$
Perform the test of hypotheses indicated, using the data from independent samples given. Use the p-value approach.
1. Test $H_{0} : μ_{1} - μ_{2} = − 10.5$ vs. $H_{a} : μ_{1} - μ_{2} > − 10.5$ @ $α = 0.01$ ,
  
  $n_{1} = 64$ , ${\bar{x}}_{1} = 85.6$ , $s_{1} = 2.4$
  
  $n_{2} = 50$ , ${\bar{x}}_{2} = 95.3$ , $s_{2} = 3.1$
2. Test $H_{0} : μ_{1} - μ_{2} = 110$ vs. $H_{a} : μ_{1} - μ_{2} \neq 110$ @ $α = 0.02$ ,
  
  $n_{1} = 176$ , ${\bar{x}}_{1} = 1918$ , $s_{1} = 68$
  
  $n_{2} = 241$ , ${\bar{x}}_{2} = 1782$ , $s_{2} = 146$
Perform the test of hypotheses indicated, using the data from independent samples given. Use the p-value approach.
1. Test $H_{0} : μ_{1} - μ_{2} = 50$ vs. $H_{a} : μ_{1} - μ_{2} > 50$ @ $α = 0.005$ ,
  
  $n_{1} = 72$ , ${\bar{x}}_{1} = 272$ , $s_{1} = 26$
  
  $n_{2} = 103$ , ${\bar{x}}_{2} = 213$ , $s_{2} = 14$
2. Test $H_{0} : μ_{1} - μ_{2} = 7.5$ vs. $H_{a} : μ_{1} - μ_{2} \neq 7.5$ @ $α = 0.10$ ,
  
  $n_{1} = 52$ , ${\bar{x}}_{1} = 94.3$ , $s_{1} = 2.6$
  
  $n_{2} = 38$ , ${\bar{x}}_{2} = 88.6$ , $s_{2} = 8.0$
Perform the test of hypotheses indicated, using the data from independent samples given. Use the p-value approach.
1. Test $H_{0} : μ_{1} - μ_{2} = 23$ vs. $H_{a} : μ_{1} - μ_{2} < 23$ @ $α = 0.20$ ,
  
  $n_{1} = 314$ , ${\bar{x}}_{1} = 198$ , $s_{1} = 12.2$
  
  $n_{2} = 220$ , ${\bar{x}}_{2} = 176$ , $s_{2} = 11.5$
2. Test $H_{0} : μ_{1} - μ_{2} = 4.4$ vs. $H_{a} : μ_{1} - μ_{2} \neq 4.4$ @ $α = 0.05$ ,
  
  $n_{1} = 32$ , ${\bar{x}}_{1} = 40.3$ , $s_{1} = 0.5$
  
  $n_{2} = 30$ , ${\bar{x}}_{2} = 35.5$ , $s_{2} = 0.7$

Applications

In order to investigate the relationship between mean job tenure in years among workers who have a bachelor’s degree or higher and those who do not, random samples of each type of worker were taken, with the following results.

	n	$\bar{x}$	s
Bachelor’s degree or higher	155	5.2	1.3
No degree	210	5.0	1.5

Construct the 99% confidence interval for the difference in the population means based on these data.
Test, at the 1% level of significance, the claim that mean job tenure among those with higher education is greater than among those without, against the default that there is no difference in the means.
Compute the observed significance of the test.

Records of 40 used passenger cars and 40 used pickup trucks (none used commercially) were randomly selected to investigate whether there was any difference in the mean time in years that they were kept by the original owner before being sold. For cars the mean was 5.3 years with standard deviation 2.2 years. For pickup trucks the mean was 7.1 years with standard deviation 3.0 years.
1. Construct the 95% confidence interval for the difference in the means based on these data.
2. Test the hypothesis that there is a difference in the means against the null hypothesis that there is no difference. Use the 1% level of significance.
3. Compute the observed significance of the test in part (b).

In previous years the average number of patients per hour at a hospital emergency room on weekends exceeded the average on weekdays by 6.3 visits per hour. A hospital administrator believes that the current weekend mean exceeds the weekday mean by fewer than 6.3 hours.

Construct the 99% confidence interval for the difference in the population means based on the following data, derived from a study in which 30 weekend and 30 weekday one-hour periods were randomly selected and the number of new patients in each recorded.

	n	$\bar{x}$	s
Weekends	30	13.8	3.1
Weekdays	30	8.6	2.7

Test at the 5% level of significance whether the current weekend mean exceeds the weekday mean by fewer than 6.3 patients per hour.
Compute the observed significance of the test.

A sociologist surveys 50 randomly selected citizens in each of two countries to compare the mean number of hours of volunteer work done by adults in each. Among the 50 inhabitants of Lilliput, the mean hours of volunteer work per year was 52, with standard deviation 11.8. Among the 50 inhabitants of Blefuscu, the mean number of hours of volunteer work per year was 37, with standard deviation 7.2.
1. Construct the 99% confidence interval for the difference in mean number of hours volunteered by all residents of Lilliput and the mean number of hours volunteered by all residents of Blefuscu.
2. Test, at the 1% level of significance, the claim that the mean number of hours volunteered by all residents of Lilliput is more than ten hours greater than the mean number of hours volunteered by all residents of Blefuscu.
3. Compute the observed significance of the test in part (b).

A university administrator asserted that upperclassmen spend more time studying than underclassmen.

Test this claim against the default that the average number of hours of study per week by the two groups is the same, using the following information based on random samples from each group of students. Test at the 1% level of significance.

	n	$\bar{x}$	s
Upperclassmen	35	15.6	2.9
Underclassmen	35	12.3	4.1

Compute the observed significance of the test.

An kinesiologist claims that the resting heart rate of men aged 18 to 25 who exercise regularly is more than five beats per minute less than that of men who do not exercise regularly. Men in each category were selected at random and their resting heart rates were measured, with the results shown.

	n	$\bar{x}$	s
Regular exercise	40	63	1.0
No regular exercise	30	71	1.2

Perform the relevant test of hypotheses at the 1% level of significance.
Compute the observed significance of the test.

Children in two elementary school classrooms were given two versions of the same test, but with the order of questions arranged from easier to more difficult in Version A and in reverse order in Version B. Randomly selected students from each class were given Version A and the rest Version B. The results are shown in the table.

	n	$\bar{x}$	s
Version A	31	83	4.6
Version B	32	78	4.3

Construct the 90% confidence interval for the difference in the means of the populations of all children taking Version A of such a test and of all children taking Version B of such a test.
Test at the 1% level of significance the hypothesis that the A version of the test is easier than the B version (even though the questions are the same).
Compute the observed significance of the test.

The Municipal Transit Authority wants to know if, on weekdays, more passengers ride the northbound blue line train towards the city center that departs at 8:15 a.m. or the one that departs at 8:30 a.m. The following sample statistics are assembled by the Transit Authority.

	n	$\bar{x}$	s
8:15 a.m. train	30	323	41
8:30 a.m. train	45	356	45

Construct the 90% confidence interval for the difference in the mean number of daily travellers on the 8:15 train and the mean number of daily travellers on the 8:30 train.
Test at the 5% level of significance whether the data provide sufficient evidence to conclude that more passengers ride the 8:30 train.
Compute the observed significance of the test.

In comparing the academic performance of college students who are affiliated with fraternities and those male students who are unaffiliated, a random sample of students was drawn from each of the two populations on a university campus. Summary statistics on the student GPAs are given below.

	n	$\bar{x}$	s
Fraternity	645	2.90	0.47
Unaffiliated	450	2.88	0.42

Test, at the 5% level of significance, whether the data provide sufficient evidence to conclude that there is a difference in average GPA between the population of fraternity students and the population of unaffiliated male students on this university campus.

In comparing the academic performance of college students who are affiliated with sororities and those female students who are unaffiliated, a random sample of students was drawn from each of the two populations on a university campus. Summary statistics on the student GPAs are given below.

	n	$\bar{x}$	s
Sorority	330	3.18	0.37
Unaffiliated	550	3.12	0.41

Test, at the 5% level of significance, whether the data provide sufficient evidence to conclude that there is a difference in average GPA between the population of sorority students and the population of unaffiliated female students on this university campus.

The owner of a professional football team believes that the league has become more offense oriented since five years ago. To check his belief, 32 randomly selected games from one year’s schedule were compared to 32 randomly selected games from the schedule five years later. Since more offense produces more points per game, the owner analyzed the following information on points per game (ppg).

	n	$\bar{x}$	s
ppg previously	32	20.62	4.17
ppg recently	32	22.05	4.01

Test, at the 10% level of significance, whether the data on points per game provide sufficient evidence to conclude that the game has become more offense oriented.

The owner of a professional football team believes that the league has become more offense oriented since five years ago. To check his belief, 32 randomly selected games from one year’s schedule were compared to 32 randomly selected games from the schedule five years later. Since more offense produces more offensive yards per game, the owner analyzed the following information on offensive yards per game (oypg).

	n	$\bar{x}$	s
oypg previously	32	316	40
oypg recently	32	336	35

Test, at the 10% level of significance, whether the data on offensive yards per game provide sufficient evidence to conclude that the game has become more offense oriented.

Large Data Set Exercises

Large Data Sets 1A and 1B list the SAT scores for 1,000 randomly selected students. Denote the population of all male students as Population 1 and the population of all female students as Population 2.

http://www.flatworldknowledge.com/sites/all/files/data1A.xls

http://www.flatworldknowledge.com/sites/all/files/data1B.xls
1. Restricting attention to just the males, find n₁, ${\bar{x}}_{1}$ , and s₁. Restricting attention to just the females, find n₂, ${\bar{x}}_{2}$ , and s₂.
2. Let $μ_{1}$ denote the mean SAT score for all males and $μ_{2}$ the mean SAT score for all females. Use the results of part (a) to construct a 90% confidence interval for the difference $μ_{1} - μ_{2} .$
3. Test, at the 5% level of significance, the hypothesis that the mean SAT scores among males exceeds that of females.
Large Data Sets 1A and 1B list the GPAs for 1,000 randomly selected students. Denote the population of all male students as Population 1 and the population of all female students as Population 2.

http://www.flatworldknowledge.com/sites/all/files/data1A.xls

http://www.flatworldknowledge.com/sites/all/files/data1B.xls
1. Restricting attention to just the males, find n₁, ${\bar{x}}_{1}$ , and s₁. Restricting attention to just the females, find n₂, ${\bar{x}}_{2}$ , and s₂.
2. Let $μ_{1}$ denote the mean GPA for all males and $μ_{2}$ the mean GPA for all females. Use the results of part (a) to construct a 95% confidence interval for the difference $μ_{1} - μ_{2} .$
3. Test, at the 10% level of significance, the hypothesis that the mean GPAs among males and females differ.
Large Data Sets 7A and 7B list the survival times for 65 male and 75 female laboratory mice with thymic leukemia. Denote the population of all such male mice as Population 1 and the population of all such female mice as Population 2.

http://www.flatworldknowledge.com/sites/all/files/data7A.xls

http://www.flatworldknowledge.com/sites/all/files/data7B.xls
1. Restricting attention to just the males, find n₁, ${\bar{x}}_{1}$ , and s₁. Restricting attention to just the females, find n₂, ${\bar{x}}_{2}$ , and s₂.
2. Let $μ_{1}$ denote the mean survival for all males and $μ_{2}$ the mean survival time for all females. Use the results of part (a) to construct a 99% confidence interval for the difference $μ_{1} - μ_{2} .$
3. Test, at the 1% level of significance, the hypothesis that the mean survival time for males exceeds that for females by more than 182 days (half a year).
4. Compute the observed significance of the test in part (c).

Answers

1. $(4 . 20,5 . 80)$ ,
2. $(− 18.54, − 9.46)$
1. $(− 12.81, − 10.39)$ ,
2. $(− 76.50, − 68.10)$
1. Z = 8.753, $\pm z_{0.025} = \pm 1.960$ , reject H₀, p-value = 0.0000;
2. $Z = − 0.687$ , $− z_{0.10} = − 1.282$ , do not reject H₀, p-value = 0.2451
1. Z = 2.444, $\pm z_{0.005} = \pm 2.576$ , do not reject H₀, p-value = 0.0146.
2. Z = 1.702, $z_{0.05} = 1.645$ , reject H₀, p-value = 0.0446
1. $Z = − 1.19$ , p-value = 0.1170, do not reject H₀;
2. $Z = − 0.92$ , p-value = 0.3576, do not reject H₀
1. Z = 2.68, p-value = 0.0037, reject H₀;
2. $Z = − 1.34$ , p-value = 0.1802, do not reject H₀

1. $0.2 \pm 0.4$ ,
2. Z = 1.360, $z_{0.01} = 2.326$ , do not reject H₀ (not greater)
3. p-value = 0.0869
1. $5.2 \pm 1.9$ ,
2. $Z = − 1.466$ , $− z_{0.050} = − 1.645$ , do not reject H₀ (exceeds by 6.3 or more)
3. p-value = 0.0708
1. Z = 3.888, $z_{0.01} = 2.326$ , reject H₀ (upperclassmen study more)
2. p-value = 0.0001
1. $5 \pm 1.8$ ,
2. Z = 4.454, $z_{0.01} = 2.326$ , reject H₀ (Test A is easier)
3. p-value = 0.0000
Z = 0.738, $\pm z_{0.025} = \pm 1.960$ , do not reject H₀ (no difference)
$Z = − 1.398$ , $− z_{0.10} = − 1.282$ , reject H₀ (more offense oriented)

1. $n_{1} = 419$ , ${\bar{x}}_{1} = 1540.33$ , $s_{1} = 205.40$ , $n_{2} = 581$ , ${\bar{x}}_{2} = 1520.38$ , and $s_{2} = 217.34 .$
2. $(− 2 . 24,42 . 15)$
3. $H_{0} : μ_{1} - μ_{2} = 0$ vs. $H_{a} : μ_{1} - μ_{2} > 0 .$ Test Statistic: Z = 1.48. Rejection Region: $[1.645, \infty) .$ Decision: Fail to reject H₀.
1. $n_{1} = 65$ , ${\bar{x}}_{1} = 665.97$ , $s_{1} = 41.60$ , $n_{2} = 75$ , ${\bar{x}}_{2} = 455.89$ , and $s_{2} = 63.22 .$
2. $(187 . 06,233 . 09)$
3. $H_{0} : μ_{1} - μ_{2} = 182$ vs. $H_{a} : μ_{1} - μ_{2} > 182 .$ Test Statistic: Z = 3.14. Rejection Region: $[2.33, \infty) .$ Decision: Reject H₀.
4. $p - v a l u e = 0.0008$