Important Concepts not on the AP Statistics Formula Sheet
Part I:
IQR = Q3 – Q1 Linear transformation: When describing data: Histogram:
Test for an outlier: Addition: affects center NOT describe fairly symmetrical
1.5(IQR) above Q3 or below spread center, spread, and unimodal
Q1 adds to , M, Q1 , Q3, IQR shape.
The calculator will run the
test for you as long as you not σ Give a 5 number
choose the boxplot with the summary or mean and
oulier on it in STATPLOT Multiplication: affects both standard deviation when
center and spread necessary.
multiplies , M, Q1 , Q3, IQR, σ
skewed Skewed left Ogive (cumulative Boxplot (with an
right frequency) outlier)
Stem and leaf Normal Probability Plot r: correlation coefficient,
The strength of the linear
relationship of data.
Close to 1 or -1 is very
close to linear
HOW MANY STANDARD r2: coefficient of
DEVIATIONS AN determination. How well
OBSERVATION IS FROM the model fits the data.
THE MEAN Close to 1 is a good fit.
The 80th percentile means that “Percent of variation in y
80% of the data is below that 68-95-99.7 Rule for described by the LSRL on
observation. Normality x”
N(µ,σ)
N(0,1) Standard Normal
residual = Exponential Model: Explanatory variables Lurking Variable: A
y = abx take log of y explain changes in variable that may
residual = response variables. influence the relationship
observed – predicted Power Model: EV: x, independent bewteen two variables.
y = axb take log of x and y RV: y, dependent LV is not among the EV’s
y = a+bx
Slope of LSRL(b): rate of
change in y for every unit x
y-intercept of LSRL(a): y
when x = 0
Confounding: two
variables are confounded
when the effects of an RV
cannot be distinguished.
, Part II: Designing Experiments and Collecting Data:
Sampling Methods:
The Bad:
Voluntary sample. A voluntary sample is made up of people who decide for themselves to be in the survey.
Example: Online poll
Convenience sample. A convenience sample is made up of people who are easy to reach.
Example: interview people at the mall, or in the cafeteria because it is an easy place to reach people.
The Good:
Simple random sampling. Simple random sampling refers to a method in which all possible samples of n objects are equally
likely to occur.
Example: assign a number 1-100 to all members of a population of size 100. One number is selected at a time from a list of
random digits or using a random number generator. The first 10 selected are the sample.
Stratified sampling. With stratified sampling, the population is divided into groups, based on some characteristic. Then, within
each group, a SRS is taken. In stratified sampling, the groups are called strata.
Example: For a national survey we divide the population into groups or strata, based on geography - north, east, south, and
west. Then, within each stratum, we might randomly select survey respondents.
Cluster sampling. With cluster sampling, every member of the population is assigned to one, and only one, group. Each group
is called a cluster. A sample of clusters is chosen using a SRS. Only individuals within sampled clusters are surveyed.
Example: Randomly choose high schools in the country and only survey people in those schools.
Difference between cluster sampling and stratified sampling. With stratified sampling, the sample includes subjects from each
stratum. With cluster sampling the sample includes subjects only from sampled clusters.
Multistage sampling. With multistage sampling, we select a sample by using combinations of different sampling methods.
Example: Stage 1, use cluster sampling to choose clusters from a population. Then, in Stage 2, we use simple random sampling
to select a subset of subjects from each chosen cluster for the final sample.
Systematic random sampling. With systematic random sampling, we create a list of every member of the population. From the
list, we randomly select the first sample element from the first k subjects on the population list. Thereafter, we select
every kth subject on the list.
Example: Select every 5th person on a list of the population.
Experimental Design:
A well-designed experiment includes design features that allow researchers to eliminate extraneous variables as an explanation
for the observed relationship between the independent variable(s) and the dependent variable.
Experimental Unit or Subject: The individuals on which the experiment is done. If they are people then we call them subjects
Factor: The explanatory variables in the study
Level: The degree or value of each factor.
Treatment: The condition applied to the subjects. When there is one factor, the treatments and the levels are the same.
Control. Control refers to steps taken to reduce the effects of other variables (i.e., variables other than the independent variable
and the dependent variable). These variables are called lurking variables.
Control involves making the experiment as similar as possible for subjects in each treatment condition. Three control strategies
are control groups, placebos, and blinding.
Control group. A control group is a group that receives no treatment
Placebo. A fake or dummy treatment.
Blinding: Not telling subjects whether they receive the placebo or the treatment
Double blinding: neither the researchers or the subjects know who gets the treatment or placebo
Randomization. Randomization refers to the practice of using chance methods (random number tables, flipping a coin, etc.) to
assign subjects to treatments.
Replication. Replication refers to the practice of assigning each treatment to many experimental subjects.
Bias: when a method systematically favors one outcome over another.
Types of design:
Completely randomized design With this design, subjects are randomly assigned to treatments.
Randomized block design, the experimenter divides subjects into subgroups called blocks. Then, subjects within each block
are randomly assigned to treatment conditions. Because this design reduces variability and potential confounding, it produces a
better estimate of treatment effects.
Matched pairs design is a special case of the randomized block design. It is used when the experiment has only two treatment
conditions; and subjects can be grouped into pairs, based on some blocking variable. Then, within each pair, subjects are
randomly assigned to different treatments.
Part I:
IQR = Q3 – Q1 Linear transformation: When describing data: Histogram:
Test for an outlier: Addition: affects center NOT describe fairly symmetrical
1.5(IQR) above Q3 or below spread center, spread, and unimodal
Q1 adds to , M, Q1 , Q3, IQR shape.
The calculator will run the
test for you as long as you not σ Give a 5 number
choose the boxplot with the summary or mean and
oulier on it in STATPLOT Multiplication: affects both standard deviation when
center and spread necessary.
multiplies , M, Q1 , Q3, IQR, σ
skewed Skewed left Ogive (cumulative Boxplot (with an
right frequency) outlier)
Stem and leaf Normal Probability Plot r: correlation coefficient,
The strength of the linear
relationship of data.
Close to 1 or -1 is very
close to linear
HOW MANY STANDARD r2: coefficient of
DEVIATIONS AN determination. How well
OBSERVATION IS FROM the model fits the data.
THE MEAN Close to 1 is a good fit.
The 80th percentile means that “Percent of variation in y
80% of the data is below that 68-95-99.7 Rule for described by the LSRL on
observation. Normality x”
N(µ,σ)
N(0,1) Standard Normal
residual = Exponential Model: Explanatory variables Lurking Variable: A
y = abx take log of y explain changes in variable that may
residual = response variables. influence the relationship
observed – predicted Power Model: EV: x, independent bewteen two variables.
y = axb take log of x and y RV: y, dependent LV is not among the EV’s
y = a+bx
Slope of LSRL(b): rate of
change in y for every unit x
y-intercept of LSRL(a): y
when x = 0
Confounding: two
variables are confounded
when the effects of an RV
cannot be distinguished.
, Part II: Designing Experiments and Collecting Data:
Sampling Methods:
The Bad:
Voluntary sample. A voluntary sample is made up of people who decide for themselves to be in the survey.
Example: Online poll
Convenience sample. A convenience sample is made up of people who are easy to reach.
Example: interview people at the mall, or in the cafeteria because it is an easy place to reach people.
The Good:
Simple random sampling. Simple random sampling refers to a method in which all possible samples of n objects are equally
likely to occur.
Example: assign a number 1-100 to all members of a population of size 100. One number is selected at a time from a list of
random digits or using a random number generator. The first 10 selected are the sample.
Stratified sampling. With stratified sampling, the population is divided into groups, based on some characteristic. Then, within
each group, a SRS is taken. In stratified sampling, the groups are called strata.
Example: For a national survey we divide the population into groups or strata, based on geography - north, east, south, and
west. Then, within each stratum, we might randomly select survey respondents.
Cluster sampling. With cluster sampling, every member of the population is assigned to one, and only one, group. Each group
is called a cluster. A sample of clusters is chosen using a SRS. Only individuals within sampled clusters are surveyed.
Example: Randomly choose high schools in the country and only survey people in those schools.
Difference between cluster sampling and stratified sampling. With stratified sampling, the sample includes subjects from each
stratum. With cluster sampling the sample includes subjects only from sampled clusters.
Multistage sampling. With multistage sampling, we select a sample by using combinations of different sampling methods.
Example: Stage 1, use cluster sampling to choose clusters from a population. Then, in Stage 2, we use simple random sampling
to select a subset of subjects from each chosen cluster for the final sample.
Systematic random sampling. With systematic random sampling, we create a list of every member of the population. From the
list, we randomly select the first sample element from the first k subjects on the population list. Thereafter, we select
every kth subject on the list.
Example: Select every 5th person on a list of the population.
Experimental Design:
A well-designed experiment includes design features that allow researchers to eliminate extraneous variables as an explanation
for the observed relationship between the independent variable(s) and the dependent variable.
Experimental Unit or Subject: The individuals on which the experiment is done. If they are people then we call them subjects
Factor: The explanatory variables in the study
Level: The degree or value of each factor.
Treatment: The condition applied to the subjects. When there is one factor, the treatments and the levels are the same.
Control. Control refers to steps taken to reduce the effects of other variables (i.e., variables other than the independent variable
and the dependent variable). These variables are called lurking variables.
Control involves making the experiment as similar as possible for subjects in each treatment condition. Three control strategies
are control groups, placebos, and blinding.
Control group. A control group is a group that receives no treatment
Placebo. A fake or dummy treatment.
Blinding: Not telling subjects whether they receive the placebo or the treatment
Double blinding: neither the researchers or the subjects know who gets the treatment or placebo
Randomization. Randomization refers to the practice of using chance methods (random number tables, flipping a coin, etc.) to
assign subjects to treatments.
Replication. Replication refers to the practice of assigning each treatment to many experimental subjects.
Bias: when a method systematically favors one outcome over another.
Types of design:
Completely randomized design With this design, subjects are randomly assigned to treatments.
Randomized block design, the experimenter divides subjects into subgroups called blocks. Then, subjects within each block
are randomly assigned to treatment conditions. Because this design reduces variability and potential confounding, it produces a
better estimate of treatment effects.
Matched pairs design is a special case of the randomized block design. It is used when the experiment has only two treatment
conditions; and subjects can be grouped into pairs, based on some blocking variable. Then, within each pair, subjects are
randomly assigned to different treatments.