Lectures, articles & chapters Field
Week 1
Lecture chapter 2: The spine of statistics
Statistical models
• We fit models to our data: we use a statistical model to represent what is
happening in the real world
• Models consist of parameters and variables
o Variables: measured constructs (fatigue) and vary across people in the
sample.
o Parameters: estimated from the data and represent constant relations
between variables in the model.
• We compute the model parameters in the sample to estimate the value in the
population
Linear regression
• Slope
• Intercept (height)
1
,Model fit
• Mean: a model of what happens in the real world → typical score
o It is not a perfect presentation of the data
o How can we assess how well the mean represents reality?
Standard deviation: spreiding vanaf het gemiddelde. How much observations in our
sample differ from the mean value within our sample.
Standard error: tells us not about how the sample mean represents the sample itself,
but how well the sample mean represents the population mean. The standard error is
the standard deviation of the sampling distribution of a statistic.
• For a given statistic (mean) it tells us how much variability there is in this statistic
across samples from the same population. Large values, therefore, indicate that
a statistic from a given sample may not be an accurate reflection of the
population from which the sample came.
Sum of squared errors: a ‘total’ and, therefore, affected by the number of data points.
Variance: the ‘average’ variability but units squared.
2
,Calculating the error
• The mean is the value from which the (squared) scores deviate least (it has the
least error).
• Measures that summarize how well the mean represents the sample data:
o Sum of squared errors, mean squared error/variance, standard deviation
Mean squared error: the error that you have by describing your data on average for each
person.
• Total dispersion depends on sample size → more informative to compute the
average dispersion: the mean of the squared errors (MSE)
• We average by dividing by the degrees of freedom (N-1) because we use sample
data to estimate the model fit in the population.
o We ‘lose’ 1 degree of freedom because we estimate the population mean
with the sample mean.
3
, The mean as a model: variance as simple measure of model fit
• General principle of model fit: Sum (SSE) or Average (MSE) the squared
deviations from the model.
o Larger values indicating lack of fit → the more accurate your model is, the
smaller the discrepancy will be between each individual case in your
dataset and your models prediction
• When the model is the mean, the MSE is called variance.
• The square root of the variance (s2) is called the standard deviation (s)
• Intuitively more appealing interpretation → average deviation from the mean, not
in squared units
Standaarddeviatie (standaardafwijking, SD): Spreiding vanaf het gemiddelde. Wortel van MS.
• Meest gebruikte spreidingsmaat
• Gemiddelde kan 5 zijn waar iedereen 3 x 5 scoort (geen spreiding vanaf het gemiddelde,
SD=0) of er wordt 1, 5 en 9 gescoord (wel spreiding vanaf het gemiddelde)
4