BOOK M&M&C
CHAPTER 8: INFERENCE FOR PROPORTIONS
8.1 INFERENCE FOR SINGLE PROPORTION
Proportion:refers to the fraction of the total that possesses a certain attribute
LARGE SAMPLE CONFIDENCE INTERVAL
Inference about a population proportion p from SRS of size n is based on the sample
proportion ^p= X /n
o X is number of successes
o When n is large, ^p has approximately the Normal distribution as size increases with
mean p and standard deviation S ^p= √ p(1− p)/n
The level C large-sample confidence interval is ^p ± m
o Recommend using interval for 90%, 95%, and 99% confidence when number of
successes and failures are both at least 10 (so can use normal approximation) and
data produced by random sample
For large samples, the margin of error for confidence level C is m=z∗SE ^p
o Critical value z* is the value for the standard Normal density curve with area C
between −z* and z*
p^ (1− p^ )
Standard error of ^p is SE ^p=
√ n
SMALLER SAMPLE SIZE
Alternative procedures such as the plus four estimate of the population proportion are
X +2
recommended ~ p= (add four imaginary observations, two successes and two failures)
n+ 4
o Mean ~ p
~
p ( 1−~
p)
o Standard deviation
√
( n+4 )
o Use large-sample confidence but with ~ p
SIGNIFICANCE TEST (hypothesis)
p^ − p 0
z=
Tests of H0: p=p0 are based on the z statistic p 0(1−p 0)
√ n
o P-values calculated from the N(0,1) distribution. Use this procedure when expected
number of successes np0 and expected number of failures n(1−p0), are both at
least 10
o Find P-value by calculating probability of getting z statistic this large or larger in the
direction specified by the alternative hypothesis
CHOOSING SAMPLE SIZE
The sample size required to obtain a confidence interval of approximate margin of error m
for a proportion is found from n=¿ ¿
1
, o p* is a guessed value for the proportion of successes in the future sample
o z* is the standard Normal critical value for the desired level of confidence
o To ensure that the margin of error of the interval is less than or equal to m if guess
1
p¿=0.5 then n= ¿ ¿
4
Software can be used to determine the sample sizes for significant tests (power)
8.2 COMPARING TWO PROPORTIONS ( p1∧ p2)
LARGE SAMPLE CONFIDENCE INTERVAL
The large-sample estimate of the difference in two population proportions is D= ^p 1−^p 2
X1 X2
o ^p1− ^p2 are the sample proportions: ^p1= and ^p2=
n1 n2
o Sampling distribution of ^p1− ^p2 with mean p1− p2 is approximately Normal with
larger sample size
The large-sample level C confidence interval is D ± m
o Recommend using interval for 90%, 95%, or 99% confidence when number of
successes and failures in both samples are at least 10 and are random samples
The margin of error for confidence level C is m=z∗SE D
o z* is the value for standard Normal density curve with area C between −z* and z*
^p1 (1−^p1) ^p2 (1−^p 2)
The standard error of the difference D is SED =
√ n1
+
n2
PLUS-FOUR CONFIDENCE INTERVAL
For smaller sample sizes, the plus four estimate (sample size at least 5) of the difference in
two population proportions is recommended
Add two imaginary observations, one success and one failure to each of the two samples
~p (1−~ p1 ) ~
p ( 1−~ p2 )
Confidence interval (~
p1−~
√
p2 ¿ ± z ¿ 1
n1 + 2
+ 2
n2 + 2
SIGNIFICANCE TEST (hypothesis)
^p1−^p2
Significance tests of comparing two proportions (H0: p1=p2) use the z statistic z=
SE Dp
X1+ X2
o The pooled estimate of the common value of p1 and p2 is ^p=
n1 +n 2
1 1
√
o The pooled standard error is SEDp = ^p (1−^p )
( +
n1 n2 )
o Use this test when the number of successes and the number of failures in each of
the samples are at least 5 + SRS + population at least 10 times as large as samples
1
Sample size for desired margin of error is given by n=
2 ()
¿ ¿ if p∗¿1 ¿ and p∗¿2 ¿ = 0.5
^p1
Relative risk is the ratio of two sample proportions RR= for SPSS
^p2
2
CHAPTER 8: INFERENCE FOR PROPORTIONS
8.1 INFERENCE FOR SINGLE PROPORTION
Proportion:refers to the fraction of the total that possesses a certain attribute
LARGE SAMPLE CONFIDENCE INTERVAL
Inference about a population proportion p from SRS of size n is based on the sample
proportion ^p= X /n
o X is number of successes
o When n is large, ^p has approximately the Normal distribution as size increases with
mean p and standard deviation S ^p= √ p(1− p)/n
The level C large-sample confidence interval is ^p ± m
o Recommend using interval for 90%, 95%, and 99% confidence when number of
successes and failures are both at least 10 (so can use normal approximation) and
data produced by random sample
For large samples, the margin of error for confidence level C is m=z∗SE ^p
o Critical value z* is the value for the standard Normal density curve with area C
between −z* and z*
p^ (1− p^ )
Standard error of ^p is SE ^p=
√ n
SMALLER SAMPLE SIZE
Alternative procedures such as the plus four estimate of the population proportion are
X +2
recommended ~ p= (add four imaginary observations, two successes and two failures)
n+ 4
o Mean ~ p
~
p ( 1−~
p)
o Standard deviation
√
( n+4 )
o Use large-sample confidence but with ~ p
SIGNIFICANCE TEST (hypothesis)
p^ − p 0
z=
Tests of H0: p=p0 are based on the z statistic p 0(1−p 0)
√ n
o P-values calculated from the N(0,1) distribution. Use this procedure when expected
number of successes np0 and expected number of failures n(1−p0), are both at
least 10
o Find P-value by calculating probability of getting z statistic this large or larger in the
direction specified by the alternative hypothesis
CHOOSING SAMPLE SIZE
The sample size required to obtain a confidence interval of approximate margin of error m
for a proportion is found from n=¿ ¿
1
, o p* is a guessed value for the proportion of successes in the future sample
o z* is the standard Normal critical value for the desired level of confidence
o To ensure that the margin of error of the interval is less than or equal to m if guess
1
p¿=0.5 then n= ¿ ¿
4
Software can be used to determine the sample sizes for significant tests (power)
8.2 COMPARING TWO PROPORTIONS ( p1∧ p2)
LARGE SAMPLE CONFIDENCE INTERVAL
The large-sample estimate of the difference in two population proportions is D= ^p 1−^p 2
X1 X2
o ^p1− ^p2 are the sample proportions: ^p1= and ^p2=
n1 n2
o Sampling distribution of ^p1− ^p2 with mean p1− p2 is approximately Normal with
larger sample size
The large-sample level C confidence interval is D ± m
o Recommend using interval for 90%, 95%, or 99% confidence when number of
successes and failures in both samples are at least 10 and are random samples
The margin of error for confidence level C is m=z∗SE D
o z* is the value for standard Normal density curve with area C between −z* and z*
^p1 (1−^p1) ^p2 (1−^p 2)
The standard error of the difference D is SED =
√ n1
+
n2
PLUS-FOUR CONFIDENCE INTERVAL
For smaller sample sizes, the plus four estimate (sample size at least 5) of the difference in
two population proportions is recommended
Add two imaginary observations, one success and one failure to each of the two samples
~p (1−~ p1 ) ~
p ( 1−~ p2 )
Confidence interval (~
p1−~
√
p2 ¿ ± z ¿ 1
n1 + 2
+ 2
n2 + 2
SIGNIFICANCE TEST (hypothesis)
^p1−^p2
Significance tests of comparing two proportions (H0: p1=p2) use the z statistic z=
SE Dp
X1+ X2
o The pooled estimate of the common value of p1 and p2 is ^p=
n1 +n 2
1 1
√
o The pooled standard error is SEDp = ^p (1−^p )
( +
n1 n2 )
o Use this test when the number of successes and the number of failures in each of
the samples are at least 5 + SRS + population at least 10 times as large as samples
1
Sample size for desired margin of error is given by n=
2 ()
¿ ¿ if p∗¿1 ¿ and p∗¿2 ¿ = 0.5
^p1
Relative risk is the ratio of two sample proportions RR= for SPSS
^p2
2