Simpson's Paradox Correct Answer Occurs when a trend or result that appears in groups of data disappears when we combine the data.
Regression Analysis Correct Answer A process for estimating and analyzing the relationship between variables.
Least-Squares Regression Line Correct Answer AKA the line of best fit. y = mx + b "rise over run"
Interpolation Correct Answer Using existing data along with the Least-Squares Regression Line to make a prediction.
Extrapolation Correct Answer Estimation by projecting OUTSIDE known data points.
p-value Correct Answer the probability that a result occurred by chance
Significance Levels in Hypothesis Testing Correct Answer Statistically Significant: If a measured p-value is less than the significance level.
NOT Statistically Significant: If a measured p-value is MORE than the significance level.
Empirical Rule Correct Answer 68 - 95 - 99.7
68% is w/in 1 standard deviation of the mean
95% is w/in 1 standard deviation of the mean
99.7% is w/in 1 standard deviation of the mean
Used for Categorical (Qual) Data Correct Answer Bar Chart
Pie Chart
Mean Correct Answer Average - adding series then dividing by the total number in that series Center & Spread Measures Correct Answer SYMMETRIC - Mean (center) / Standard Deviation (spread)
SKEWED - Median (center) / IQR (spread)
CATEGORICAL - Mode (center) / no spread
What is the difference between a bar chart and a histogram? Correct Answer BAR CHART - Categorical data distributed over groups of categories
HISTOGRAM - Quantitative data distributed over various intervals
Outlier Formula Correct Answer 1.5 x IQR above Q3 or below Q1
Graphical Displays for Quantitative Variables Correct Answer HISTOGRAMS - large data sets
STEM PLOTS - keeps individual data points DOT PLOTS - smaller data sets
BOX PLOTS - center, spread and 5 number summary
SCATTER PLOTS - x,y axis
5 Number Summary Correct Answer Min - Q1 - Median - Q3 - Max
Graphical Displays and Corresponding Numerical Analysis Tools for Examining Data Relationships Correct Answer C > C - Two-way Frequency Tables (aka Contingency Table) use conditional percentages and relative frequencies
C > Q - Side-by-Side Boxplots use 5 number summaries
Q > Q - Scatterplots use correlation coefficients
InterQuartile Range (IQR) Correct Answer Q3-Q1
Standard Deviation Correct Answer Average distance each data point is from the mean
3 Types of Percentages on a Two-Way Frequency Table Correct Answer OVERALL% - (aka Relative Frequencies) When all of the counts are divided by the overall total
CONDITIONAL ROW% - when the explanatory variable is in rows; calculated by dividing the joint frequencies of each row by the row total
CONDITIONAL COLUMN% - when the explanatory variable is in the columns; calculated by dividing joint frequencies of each column by column total
Right and Left Skew Correct Answer Right Skewed is positive (tail li.)