Learning Objectives:
● Construct a simple regression model to examine the association between two quantitative
variables
● Interpret the parameters in a simple regression model
● Interpret the sums of squares in a simple regression model
● Draw a conclusion about the formulated hypothesis using statistical tests
● Explain how the parameter estimation and significance test of the simple regression model
function
● Name the assumptions of the simple regression model
(1) Hypothesis Testing
Process
1. Formulate a hypothesis
○ What do you expect?
2. Check study characteristics and variables
○ Sampling procedure
○ Experimental design
○ Measurement levels
3. Descriptive analyses
○ What are the sample characteristics, how are the relevant variables distributed (eg M
and SD)
4. Inferential analyses
○ Test relation or differences, including a check of the model diagnostics
5. Interpret and report results
○ APA format
Eg Is class size associated with academic performance?
Non-directional
● x predicts y → class size is associated with study performance
Directional
● Positive association: higher x predicts higher y (or vice versa)
○ Average performance increases when class size increases
○ Average performance decreases when class size decreases
● Negative association: higher x predicts lower y (or vice versa)
○ Performance typically is better in smaller classes
○ Performance typically is worse in larger classes
Checking study characteristics and variables
● Cross-sectional study → across randomly selected schools in the Netherlands
● Predictor → class size: measured as the average class size in a school (quantitative)
● Criterion/outcome → academic performance: school’s average grade on a standardised test
(quantitative)
, 1.1 Descriptive Statistics
Univariate descriptives: describes single variables
● Shape: bell-shaped (skewed/uniform/bimodal)
● Location: mean (median/mode)
● Scale: standard deviation (SD) or variance (min/max)
Scatterplots: visualise the association between response (y) and explanatory (x) variables
● Every dot is an observation
● Inspect: is a linear model ( ^y =a+bx ) appropriate to describe the association?
○ If yes, we can use least square estimation to estimate the linear prediction equation
→ best straight line, falling closest to all data points in the scatterplot
1.1.1 Least Square Estimation
^y =a+bx
How a and b are determined:
Σ( x −x)( y − y)
b=
Σ( x−x )2
a= y−b x
Slope
b is positive when:
● Low x-values often coincide with low y-values, and high x-values coincide with high y-values
→ schools with larger classes on average perform better
b is negative when:
● Low x-values often coincide with high y-values, and high x-values coincide with low y-values
→ schools with larger classes on average perform worse