● On the AP Test, SAY WHAT THE VALUE THAT YOU FOUND IS before
you explain the context. (ex: “r2 = 0.955. Context: 95.5% of the variation in
….”)
Semester 1 Review
Unit 1
Qualitative (Categorical): Takes on Categories of Data (Gender, Political
Affiliation) - use with X2-distribution or Z-Distribution
● Use: Tables, Pie Charts, Dotplots, Bar Charts, Segmented/Side-by-Side Bar
Charts
Quantitative: Is Numerical (Age, Total Cost, etc.) - use w/ T-Distribution
● Use: Histograms, Scatterplots, Dot Plots, Stem-and-Leaf plots, Boxplots
● Types of Variables: Discrete (Countable) vs. Continuous (not countable -
weight)
To Describe/Compare Distributions: S (Shape) O (Outliers) C (Center) S
(Spread)
● Shape: Modality (How many “Peaks” the data has - Unimodal, Bimodal,
etc.) and Shape of Data (Symmetric, Skewed Left/Right, Uniform)
- Skewed Left: Tail of Data to the LEFT/ Mean pulled towards the left
-> Median > Mean
- Skewed Right: Tail of Data to the RIGHT/ Mean pulled to the right ->
Median < Mean
- In Symmetric or Unimodal distributions, Mean = Median
● Outliers: Use IQR Rule (1.5 * IQR) or use 2 S.D. Rule (For Normal
Distributions, outliers exist outside 2 standard deviations of the mean -> μ±
2σ)
● Center: Use mean/median/Q1 or Q3 measures, approximate if necessary.
● Spread: Use Standard Deviation, Range, IQR, etc.
- What is an IQR: Q3-Q1, represents the spread/range of the middle
half (50% of the data) of the data
,Largest S.D -> Data with the Largest Spread (Farthest from the Mean - Huge Gaps
between Data, for example)
Frequency v. Rel Frequency: Frequency == the counts, Rel. Frequency == the
proportions
Percentile: % of Values less than or equal to value -> 25th percentile has 25% of
the values below it.
Z Score: Shows # of standard deviations above/below the mean, calculated w/ (xi -
μ)/σ
- Is Standardized (Doesn’t take Units)
- Both Z-Scores and Percentiles can be used for ANY DISTRIBUTION
Normal Distribution: Symmetric, Bell Shaped
- Use NormalCDF to calculate Proportions. When using z-scores w/
Normalcdf, μ=0, σ=1.
- invNorm calculates the value OR z-score given the proportion (area).
, Example Problems
1) Scientists working for a water district measure the water level in a lake each
day. The daily water level in the lake varies due to weather conditions and
other factors. The daily water level has a distribution that is approximately
normal with mean water level of 84.07 feet. The probability that the daily
water level in the lake is at least 100 feet is 0.064. Which of the following is
closest to the probability that on a randomly selected day the water level in
the lake will be at least 90 feet?
● To solve the probability, we’ll need to find the missing Standard
Deviation. Find the Z-score of the proportion using
invNorm(proportion, 0, 1) -> invNorm(0.064, 0, 1) = 1.522. Next, set
the z-score equal to the z-score formula -> 1.522 = (100-84.07)/σ. You
can now solve for S.D, and do the problem as normal.
Unit 2
Scatterplots: Relate bivariate (2 variable) data - x (explanatory/independent
variable - influences or explains change) and y (response/dependent variable -
outcome of the change caused by explanatory variable).
- Describe scatterplots with Strength, Direction, and Form (ex:
Strong/Moderate/Weak, Positive/Negative, Linear/Curved/No Pattern)
Least Squares Regression Line: Describes how y changes as x changes (predicts
y given x value), and minimizes the residuals (Actual - Predicted y value). Sum of
residuals ALWAYS 0.
● Must Contain (x̄, ȳ)
● Equation: ŷ = a + bx, ŷ = predicted y, x = reg x ALWAYS DEFINE
VARIABLES!!
- To Interpret Slope: “For every 1-unit increase in x, the y is predicted
to change by b.”
- To Interpret y-Intercept: “When x is 0, the predicted value of y is a.”
- To Calculate Slope: b = r(Sy/Sx)
● Extrapolation: When you attempt to estimate values with the LSR line
using x-values that are far outside the data range. DO NOT!!!