8.2. Constructing a confidence interval to estimate
a population proportion
Finding the 95% confidence interval for a population proportion
- Population proportion is p.
- The point estimate of the population proportion is the sample proportion, which is ^p
- For large random samples, the Central Limit Theorem tells us that the sampling distribution
of the sample proportion ^p is approximately normal.
- The margin of error for a 95% confidence interval with normal sampling distribution is 1.96
(standard deviation).
o There is about 95% change that ^p falls within 1.96 standard deviations of the
population proportion p (the mean of the sampling distribution of ^p)
- A 95% confidence interval is given by [ point estimate ±margin of error ] , which becomes
^p ±1.96 (standard deviation)
- The standard deviation of a sample proportion equals
o
√ p(1− p)
n
This formula depends on the unknown population proportion p. In practice, we
don’t know p, and we need to estimate it to compute the standard deviation.
- The standard error is an estimated standard deviation of a sampling distribution. We will use
se as shorthand for standard error.
o For example, the standard error for the sample proportion is given by
se=
√ ^p (1− ^p )
n
We use it to compute the confidence interval for a population proportion p
A 95% confidence interval for a population p is ^p ±1.96 ( se ), with
se=
√ ^p (1− ^p )
n observations
n
, where ^p denotes the sample proportion based on
Sample size needed for validity of confidence interval for a proportion
- The confidence interval formula ^p ±1.96 (se ) applies with large random samples.
o This is because the sampling distribution of the sample proportion ^p is then
approximately normal and the se then also tends to be a good estimate of the
standard deviation, allowing us to use the z-score of 1.96 from the normal
distribution.
- In practice, “large” means you should have at least 15 successes and at least 15 failures for
the binary outcome.
- SUMMARY: for the 95% confidence interval ^p ±1.96 (se ) for a proportion p to be valid, you
should have at least 15 successes and 15 failures.
o This can also be expressed as n ^p ≥15 and n(1− ^p )≥ 15
Using a confidence level other than 95%
- In practice, the confidence level of 0.95 is the most common choice. But some applications
require greater confidence (like with medical research).
, - To increase the chance of a correct inference (that is, having the interval contain the
parameter value), we use a larger confidence level, such as 0.99.
- Now, 99% of the normal sampling distribution for the sample proportion ^p occurs within
2.58 standard errors of the population proportion p.
o A 99% confidence interval for p is ^p ±2.58 (se )
Why settle for anything less than 100% confidence?
- To have a 100% confidence interval, it must contain all possible values for the parameter. It
would go from 0.0 to 1.0, between 0.0% and 100%, which isn’t helpful.
- In practice, we settle for a little less than perfect confidence so we can estimate the
parameter value more precisely.
- In using confidence intervals, we must compromise between the desired margin of error and
the desired confidence of a correct inference. As one gets better, the other gets worse.
Error probability for the confidence interval method
- The general formula for the confidence interval for a population proportion is
sample proportion±(z−score¿ normal table)(standard error ) which in symbols is
^p ± z ( se)
- The z-score depends on the confidence level
- To find z-score of confidence interval (1− probability )/2
Effect of the sample size
- The margin of error is z ( se )=z
o
√ ^p ( 1− ^p )
n
This margin decreases as sample size n increases, for a given value of ^p. The larger
the value of n, the narrower the interval.
a population proportion
Finding the 95% confidence interval for a population proportion
- Population proportion is p.
- The point estimate of the population proportion is the sample proportion, which is ^p
- For large random samples, the Central Limit Theorem tells us that the sampling distribution
of the sample proportion ^p is approximately normal.
- The margin of error for a 95% confidence interval with normal sampling distribution is 1.96
(standard deviation).
o There is about 95% change that ^p falls within 1.96 standard deviations of the
population proportion p (the mean of the sampling distribution of ^p)
- A 95% confidence interval is given by [ point estimate ±margin of error ] , which becomes
^p ±1.96 (standard deviation)
- The standard deviation of a sample proportion equals
o
√ p(1− p)
n
This formula depends on the unknown population proportion p. In practice, we
don’t know p, and we need to estimate it to compute the standard deviation.
- The standard error is an estimated standard deviation of a sampling distribution. We will use
se as shorthand for standard error.
o For example, the standard error for the sample proportion is given by
se=
√ ^p (1− ^p )
n
We use it to compute the confidence interval for a population proportion p
A 95% confidence interval for a population p is ^p ±1.96 ( se ), with
se=
√ ^p (1− ^p )
n observations
n
, where ^p denotes the sample proportion based on
Sample size needed for validity of confidence interval for a proportion
- The confidence interval formula ^p ±1.96 (se ) applies with large random samples.
o This is because the sampling distribution of the sample proportion ^p is then
approximately normal and the se then also tends to be a good estimate of the
standard deviation, allowing us to use the z-score of 1.96 from the normal
distribution.
- In practice, “large” means you should have at least 15 successes and at least 15 failures for
the binary outcome.
- SUMMARY: for the 95% confidence interval ^p ±1.96 (se ) for a proportion p to be valid, you
should have at least 15 successes and 15 failures.
o This can also be expressed as n ^p ≥15 and n(1− ^p )≥ 15
Using a confidence level other than 95%
- In practice, the confidence level of 0.95 is the most common choice. But some applications
require greater confidence (like with medical research).
, - To increase the chance of a correct inference (that is, having the interval contain the
parameter value), we use a larger confidence level, such as 0.99.
- Now, 99% of the normal sampling distribution for the sample proportion ^p occurs within
2.58 standard errors of the population proportion p.
o A 99% confidence interval for p is ^p ±2.58 (se )
Why settle for anything less than 100% confidence?
- To have a 100% confidence interval, it must contain all possible values for the parameter. It
would go from 0.0 to 1.0, between 0.0% and 100%, which isn’t helpful.
- In practice, we settle for a little less than perfect confidence so we can estimate the
parameter value more precisely.
- In using confidence intervals, we must compromise between the desired margin of error and
the desired confidence of a correct inference. As one gets better, the other gets worse.
Error probability for the confidence interval method
- The general formula for the confidence interval for a population proportion is
sample proportion±(z−score¿ normal table)(standard error ) which in symbols is
^p ± z ( se)
- The z-score depends on the confidence level
- To find z-score of confidence interval (1− probability )/2
Effect of the sample size
- The margin of error is z ( se )=z
o
√ ^p ( 1− ^p )
n
This margin decreases as sample size n increases, for a given value of ^p. The larger
the value of n, the narrower the interval.