Lecture 1: Introduction and refresher inferential statistics
Population mean (µ)
Population mean= the average of the population
N
-
∑ Xi
i
μ=
N
o N= population
Variation= the dispersion of the mean
- High everyone has 5 (mean is 5)
- Low 50% has 1 and 50% has 10 (mean is 5)
Population standard deviation (σ ) & variance (σ 2)
Standard deviation= a measure of how dispersed the data is related to the mean
- How far does each score lie from the mean
- σ=
√ ∑ ( X i−μ ) 2
N
Variance= the overall deviation of the observations from their means value
- σ
2
=
∑ ( X i−μ )
2
N
Hypothesis
H0= the claim that the effect being studies does not exist
Ha= the claim that the effect being studied does exist
P-value & calculation
p-value= probability of obtaining the observed results, assuming H0 is true
- How likely is it, that your data could have occurred under the null hypothesis
- Can be found in a table after calculating the test-statistic t
- Which you test to a significance level
o Commonly accepted significance level α = 0,05
o t-test= difference between sample mean and the mean if H0 was true
X−μ
- t=
SE X
σ
o SE=
√n
Confidence interval
Confidence interval= indicates all values of a null-hypothesis
that would not be rejected by the current sample mean
- Range sample mean +/- 1,96SE
o Upper bound +
o Lower bound -
Sample and Sampling distribution
Central limit theorem= N>30 sampling distribution becomes
normal distribution
Sampling distribution= a probability distribution of a statistic obtained from a larger number of samples from a
larger population (normal distribution)
1
,Sample distribution= a probability distribution of a smaller part of the population
Lecture 2A: Simple regression
Regression analysis
Regression analysis= method for estimation of relationship between variables
- X-axis Causal determined (independent variable)
- Y-axis Caused by something (dependent variable)
- Predicting values on a Y-variable based on one or more X-variables
By trying to predict one variable based on scores on another variable, we can find out what the association between
political phenomena looks like
Explanatory purpose Informative about the causal relationship
- But correlation doesn’t imply causation
Descriptive purpose Even without causal relation, it is interesting in its own right
Simple linear regression model
- Intercept= where does the line cross the y-axis, when x is 0
- Regression coefficient= how much does Y-axis multiply per one unit
- Residual= distance of a certain point towards the (linear) line
Predicted values= values on Y for each case based on the estimated model
- Y^ =b +b X
i 0 1 i
Observed value= values on Y for each case that we actually observe in the sample
- Y i=b0 +b1 X i +ε i
Residuals= the difference between the observed values and the predicted values
- ε i=Y i−Y^i
Simple one independent variable (X)
Linear effect of X on Y can be represented by a straight line
- Steepness of the line (slope) given by the value of regression coefficient
o Multiply the value of X of each case i by b1
Testing significance
H0 b1=0
Ha b1 < or > 0
T- test test the statistical significance of the regression coefficients
b 1estimated
- t=
SEb 1
- Corresponds with a p-value
P-value= the probability that you would have found the estimated coefficient for b1 in your sample if X & Y would be
completely unrelated in the population
- P < α reject H0
o Effect of X on Y is statistically significant
2
, 3