Statistic 8.2
Comparing Two Proportions
Two independent SRSs
Count of successes X
To compare the two populations, we use the difference
between two sample proportions
D= ^p1− ^p 2
When both sample sizes are sufficiently large, the sampling
distribution of the Difference D is approximately Normal
The additional rule for means, the mean of D is the difference
of the mean:
μD =μ ^p −μ ^p = p1− p2
1 2
D = ^p1−^p2 the difference between the sample proportions is
an unbiased estimator of the population difference p1 – p2
The additional rule for variances tells us that the variance of D
is the sum of the variances
2 2 2
σ D=σ ^p +σ ^p
1 2
p (1−p 1) p2 (1− p 2)
¿ 1 +
n1 n2
Standard deviation SE D = 1
√ p (1− p 1) p2 (1− p 2)
n1
+
n2
Large-sample confidence interval for a difference in proportions
For a confidence interval of difference we replace the
unknown parameters in the standard deviation by estimates
to obtain an estimated standard deviation
o M ± z*SED
Because it is easier to discuss positive numbers, we generally
choose the first population to be the one with the higher
proportion
Plus four confidence interval fro a difference in proportions
A small modification of the sample proportion can greatly
improve the accuracy of confidence intervals
The plus four estimates of the two population proportions are
~ X +1 X +1
p1= 1 ∧~ p2= 2
n1 +2 n2 +2
The estimated difference between the populations is
~ ~ ~
D= p1− p2
~
And the standard deviation of D is approximately
√
σ ~D= 1
p (1− p1 ) p 2 (1− p2 )
n1 +2
+
n2+ 2
, ~
p1 (1−~ p 1) ~
p (1−~
SE~D =
√ n1 +2
+ 2
n2 +2
p 2)
Although the interval includes the possibility that there is no
difference, corresponding to p1 = p2 or p1 – p2 = 0, we should
not conclude that there is no difference in the proportions
Significance test for a difference in proportions
X +X
^p= number of successes on bothsamples = 1 2
number of observations ∈both samples n 1+ n2
The estimate of p is called the pooled estimate because it
combines, or pools, the information from both samples
√ (
SE D = ^p ( 1− ^p )
p
1 1
+
n1 n2)
^p 1−^p2
Z statistic z=
SE Dp
X1+ X2
Pooled standard error p
√
SE D = ^p ( 1− ^p )
( n1 + n1 )
1 2
^p=
n1 +n2
The z test is based on the Normal approximation to the
binomial distribution
Relative risk
RR – relative risk
A relative risk of 1 means that the two proportions are equal
Relative risk is the ratio of two sample proportions:
^p
RR= 1
^p 2
Lecture 25
Differences vs. Ratios
Assume that in one population (A), 5% have a disease. In a
second population (B), 10% have a disease. How can we
compare these proportions?
o “There is a difference of 5% between population A and
population B.”
o “The prevalence of the disease is twice as great in
population B as in population A”
There are several method for comparing two proportions
Comparing Two Proportions
Two independent SRSs
Count of successes X
To compare the two populations, we use the difference
between two sample proportions
D= ^p1− ^p 2
When both sample sizes are sufficiently large, the sampling
distribution of the Difference D is approximately Normal
The additional rule for means, the mean of D is the difference
of the mean:
μD =μ ^p −μ ^p = p1− p2
1 2
D = ^p1−^p2 the difference between the sample proportions is
an unbiased estimator of the population difference p1 – p2
The additional rule for variances tells us that the variance of D
is the sum of the variances
2 2 2
σ D=σ ^p +σ ^p
1 2
p (1−p 1) p2 (1− p 2)
¿ 1 +
n1 n2
Standard deviation SE D = 1
√ p (1− p 1) p2 (1− p 2)
n1
+
n2
Large-sample confidence interval for a difference in proportions
For a confidence interval of difference we replace the
unknown parameters in the standard deviation by estimates
to obtain an estimated standard deviation
o M ± z*SED
Because it is easier to discuss positive numbers, we generally
choose the first population to be the one with the higher
proportion
Plus four confidence interval fro a difference in proportions
A small modification of the sample proportion can greatly
improve the accuracy of confidence intervals
The plus four estimates of the two population proportions are
~ X +1 X +1
p1= 1 ∧~ p2= 2
n1 +2 n2 +2
The estimated difference between the populations is
~ ~ ~
D= p1− p2
~
And the standard deviation of D is approximately
√
σ ~D= 1
p (1− p1 ) p 2 (1− p2 )
n1 +2
+
n2+ 2
, ~
p1 (1−~ p 1) ~
p (1−~
SE~D =
√ n1 +2
+ 2
n2 +2
p 2)
Although the interval includes the possibility that there is no
difference, corresponding to p1 = p2 or p1 – p2 = 0, we should
not conclude that there is no difference in the proportions
Significance test for a difference in proportions
X +X
^p= number of successes on bothsamples = 1 2
number of observations ∈both samples n 1+ n2
The estimate of p is called the pooled estimate because it
combines, or pools, the information from both samples
√ (
SE D = ^p ( 1− ^p )
p
1 1
+
n1 n2)
^p 1−^p2
Z statistic z=
SE Dp
X1+ X2
Pooled standard error p
√
SE D = ^p ( 1− ^p )
( n1 + n1 )
1 2
^p=
n1 +n2
The z test is based on the Normal approximation to the
binomial distribution
Relative risk
RR – relative risk
A relative risk of 1 means that the two proportions are equal
Relative risk is the ratio of two sample proportions:
^p
RR= 1
^p 2
Lecture 25
Differences vs. Ratios
Assume that in one population (A), 5% have a disease. In a
second population (B), 10% have a disease. How can we
compare these proportions?
o “There is a difference of 5% between population A and
population B.”
o “The prevalence of the disease is twice as great in
population B as in population A”
There are several method for comparing two proportions