MMC Chapter 8: Proportions
➔ Binomial distribution review from 1.3
◆ Conditions:
● Fixed number of observations n
● n observations are all independent
● Each observation falls into one of the two categories “success” or
“failure”
● Probability of success (p) is the same for all observations
◆ Examples: coin toss, yes/no survey
◆ B(n,p)
◆ Binomial distributions are important when we want to make inferences about
the proportion p of successes in a population
◆ Generally, we use binomial sampling distribution for counts when the population
is at least 20 times as large as the sample
◆ If a count X has the binomial distribution B(n,p), then:
◆ the count X has a binomial distribution, not the p^ !!
_____________
8.1 Inference for a single proportion
➔ we record counts or proportions when we collect data about a categorical variable from
a population
➔ we draw a simple random sample (SRS) from the population
➔ the sample proportion p^= X/n estimates the unknown population proportion p
➔ if the population is at least 20 times as large as the sample, then the count X has a
binomial distribution B(n,p)
➔ When the sample size n sufficiently large, the sampling distribution of p^ is
approximately normal with mean and standard deviation
,➔ however, we don’t know the population proportion p, so we have to replace it with p^---
now it’s called standard error
➔ We use the large-sample confidence interval for 90%, 95%, and 99% confidence
whenever the number of successes and the number of failures are both at least 10.
➔ For smaller sample sizes, we recommend exact methods that use the binomial
distribution.
➔ There is also an intermediate case between large samples and very small samples where
a slight modification of the large-sample approach works quite well. This method is
called the “plus four” procedure:
➔ We add 4 observations to the sample, with 2 successes and 2 failures
➔ Significance test for a single proportion:
◆ distribution of sample proportion p^ is appx. normal— to construct confidence
intervals, we substitute p^ in place of pto obtain the standard error (and use it
for margin or error)
◆ however in significance testing, we assume that the value given by null
hypothesis for p is true H0: p=p0
, ◆
◆ In problems like which product is better etc., two-died tests should be used
because we cannot make a scientific claim on the superiority of one product over
another (for advertising purposes etc.)
◆ we often don’t conduct sig tests for a single proportion because there is often
not a single p0 we want to test— i.g. coin tossing, drawing cards, proportions
from previous studies etc. could provide p0
➔ choosing a sample size for confidence interval:
◆
◆ we aim to pick a specific sample size for our desired margin of error
◆ chosen confidence level determines the z-value
◆ we don’t know p^ yet bc we didn’t collect data yet:
● we can use p^ from a previous similar study
● we can take p^=0.5, because the margin of error is largest in this case and
it will generate n larger than we actually need (safe choice)
◆ then, we can calculate n
, 8.2 Comparing two proportions
➔ now we compare two proportions from 2 populations
➔ the difference between 2 sample proportions: D=p^1-p^2
➔ when both sample sizes are large, sampling distribution of the difference D is appx
normal
➔ mean of D: (addition rule for means)
➔ standard deviation of D:
➔ Confidence interval for a difference in proportions:
➔ Binomial distribution review from 1.3
◆ Conditions:
● Fixed number of observations n
● n observations are all independent
● Each observation falls into one of the two categories “success” or
“failure”
● Probability of success (p) is the same for all observations
◆ Examples: coin toss, yes/no survey
◆ B(n,p)
◆ Binomial distributions are important when we want to make inferences about
the proportion p of successes in a population
◆ Generally, we use binomial sampling distribution for counts when the population
is at least 20 times as large as the sample
◆ If a count X has the binomial distribution B(n,p), then:
◆ the count X has a binomial distribution, not the p^ !!
_____________
8.1 Inference for a single proportion
➔ we record counts or proportions when we collect data about a categorical variable from
a population
➔ we draw a simple random sample (SRS) from the population
➔ the sample proportion p^= X/n estimates the unknown population proportion p
➔ if the population is at least 20 times as large as the sample, then the count X has a
binomial distribution B(n,p)
➔ When the sample size n sufficiently large, the sampling distribution of p^ is
approximately normal with mean and standard deviation
,➔ however, we don’t know the population proportion p, so we have to replace it with p^---
now it’s called standard error
➔ We use the large-sample confidence interval for 90%, 95%, and 99% confidence
whenever the number of successes and the number of failures are both at least 10.
➔ For smaller sample sizes, we recommend exact methods that use the binomial
distribution.
➔ There is also an intermediate case between large samples and very small samples where
a slight modification of the large-sample approach works quite well. This method is
called the “plus four” procedure:
➔ We add 4 observations to the sample, with 2 successes and 2 failures
➔ Significance test for a single proportion:
◆ distribution of sample proportion p^ is appx. normal— to construct confidence
intervals, we substitute p^ in place of pto obtain the standard error (and use it
for margin or error)
◆ however in significance testing, we assume that the value given by null
hypothesis for p is true H0: p=p0
, ◆
◆ In problems like which product is better etc., two-died tests should be used
because we cannot make a scientific claim on the superiority of one product over
another (for advertising purposes etc.)
◆ we often don’t conduct sig tests for a single proportion because there is often
not a single p0 we want to test— i.g. coin tossing, drawing cards, proportions
from previous studies etc. could provide p0
➔ choosing a sample size for confidence interval:
◆
◆ we aim to pick a specific sample size for our desired margin of error
◆ chosen confidence level determines the z-value
◆ we don’t know p^ yet bc we didn’t collect data yet:
● we can use p^ from a previous similar study
● we can take p^=0.5, because the margin of error is largest in this case and
it will generate n larger than we actually need (safe choice)
◆ then, we can calculate n
, 8.2 Comparing two proportions
➔ now we compare two proportions from 2 populations
➔ the difference between 2 sample proportions: D=p^1-p^2
➔ when both sample sizes are large, sampling distribution of the difference D is appx
normal
➔ mean of D: (addition rule for means)
➔ standard deviation of D:
➔ Confidence interval for a difference in proportions: