Statistic 7.2
7.2 Comparing Two Means
Two-sample problems
o The goal of inference is to compare the responses of two
groups
o Each group is considered to be a sample from a distinct
population
o The responses in each group are independent of those in
the other group
A two-sample problem can arise from a randomized
comparative experiment that randomly divides the subject
into two groups and exposes each group to a different
treatment
comparing random samples separately selected from two
populations is also a two-sample problem
no matching of the units in the two samples
the two samples might be of different sizes
we can present two-sample data in a back-to-back stemplot or
side-by-side boxplot
we have to independent samples, from two distinct
populations
we can compare the two population means, either by giving a
CI for μ1−μ2 or by testing the hypothesis of no difference,
H 0 : μ 1=μ2
The two-sample z statistic
μ1−μ2= x́ 1− x́ 2
σ 21 σ 22
the variance of the difference x́ 1−x́ 2 is +
n1 n 2
large samples are needed to see the effects of small
differences
two-sample z statistic – suppose that x́ 1 is the mean of an
SRS of size n1 drawn from an N( μ1 ,σ 1 ¿ population and that
x́ 2 has the mean of an independent SRS of size n2 drawn
from N( μ2 , σ 2 ¿ population. Than the two-sample z statistic:
( x́1 −x́2 ) −(μ1 −μ 2)
z=
√
o σ 21 σ 22
+
n1 n2
o has the standard Normal N(0,1) sampling distribution
The two-sample t procedures
σ 1 and σ 2 are not known
, ( x́1− x́2 ) −(μ1 −μ 2)
t=
√
s21 s22
+
n1 n2
this statistic does not have a t distribution, because we have
two standard deviations by their estimates
t(k) distribution – approximation for the degrees of freedom k
we use these approximations to find approximate values of t*
for CIs and to find approximate p-values for significance tests
The two-sample t significance test
the two-sample t significance test – suppose that an SRS of
size n1 is drawn from a Normal population with unknown
mean μ1 and that an independent SRS of size n2 is drawn
from another Normal population with unknown mean μ2 . To
test the hypothesis H 0 : μ 1=μ2 , compute the two-sample t
statistic
( x́ 1−x́ 2)
t=
√
o s 21 s 22
+
n1 n 2
o and use p-values or critical values for the t(k)
distribution, where the degrees of freedom k are either
approximated by software or are the smaller of
n1−1∧n2−1
conservative inference procedures for comparing μ1∧μ2 are
obtained from the two-sample t statistic by using the t(k)
distribution with degrees of freedom k equal to the smaller of
n1−1∧n2−1
more accurate probability values can be obtained by
estimating the degrees of freedom from the data. This is the
usual procedure for statistical software
The two-sample t confidence interval
the two-sample t confidence interval – suppose that an SRS of
size n1 is drawn from a Normal population with unknown
mean μ1 and that an independent SRS of size n2 is drawn
from another Normal population with unknown mean μ2 .
The confidence interval for μ1−μ2 given by
o
s 21 s22
( x́ 1−x́ 2 ) ± t *
+
n 1 n2 √
o has confidence level at least C no matter what the
population standard deviations may be. Here t* is the
value for the t(k) density curve with area C between –t*
7.2 Comparing Two Means
Two-sample problems
o The goal of inference is to compare the responses of two
groups
o Each group is considered to be a sample from a distinct
population
o The responses in each group are independent of those in
the other group
A two-sample problem can arise from a randomized
comparative experiment that randomly divides the subject
into two groups and exposes each group to a different
treatment
comparing random samples separately selected from two
populations is also a two-sample problem
no matching of the units in the two samples
the two samples might be of different sizes
we can present two-sample data in a back-to-back stemplot or
side-by-side boxplot
we have to independent samples, from two distinct
populations
we can compare the two population means, either by giving a
CI for μ1−μ2 or by testing the hypothesis of no difference,
H 0 : μ 1=μ2
The two-sample z statistic
μ1−μ2= x́ 1− x́ 2
σ 21 σ 22
the variance of the difference x́ 1−x́ 2 is +
n1 n 2
large samples are needed to see the effects of small
differences
two-sample z statistic – suppose that x́ 1 is the mean of an
SRS of size n1 drawn from an N( μ1 ,σ 1 ¿ population and that
x́ 2 has the mean of an independent SRS of size n2 drawn
from N( μ2 , σ 2 ¿ population. Than the two-sample z statistic:
( x́1 −x́2 ) −(μ1 −μ 2)
z=
√
o σ 21 σ 22
+
n1 n2
o has the standard Normal N(0,1) sampling distribution
The two-sample t procedures
σ 1 and σ 2 are not known
, ( x́1− x́2 ) −(μ1 −μ 2)
t=
√
s21 s22
+
n1 n2
this statistic does not have a t distribution, because we have
two standard deviations by their estimates
t(k) distribution – approximation for the degrees of freedom k
we use these approximations to find approximate values of t*
for CIs and to find approximate p-values for significance tests
The two-sample t significance test
the two-sample t significance test – suppose that an SRS of
size n1 is drawn from a Normal population with unknown
mean μ1 and that an independent SRS of size n2 is drawn
from another Normal population with unknown mean μ2 . To
test the hypothesis H 0 : μ 1=μ2 , compute the two-sample t
statistic
( x́ 1−x́ 2)
t=
√
o s 21 s 22
+
n1 n 2
o and use p-values or critical values for the t(k)
distribution, where the degrees of freedom k are either
approximated by software or are the smaller of
n1−1∧n2−1
conservative inference procedures for comparing μ1∧μ2 are
obtained from the two-sample t statistic by using the t(k)
distribution with degrees of freedom k equal to the smaller of
n1−1∧n2−1
more accurate probability values can be obtained by
estimating the degrees of freedom from the data. This is the
usual procedure for statistical software
The two-sample t confidence interval
the two-sample t confidence interval – suppose that an SRS of
size n1 is drawn from a Normal population with unknown
mean μ1 and that an independent SRS of size n2 is drawn
from another Normal population with unknown mean μ2 .
The confidence interval for μ1−μ2 given by
o
s 21 s22
( x́ 1−x́ 2 ) ± t *
+
n 1 n2 √
o has confidence level at least C no matter what the
population standard deviations may be. Here t* is the
value for the t(k) density curve with area C between –t*