Multivariate thinking: the realization that many economic factors play a role, some more
important than others.
With regression analysis, we can see which factors are more important than others.
Simple regression: one independent variable, one dependent variable
Multiple regression: two or more independent variables, one dependent variable.
Regression model = probabilistic, consists of deterministic component and random error
term (random error term represents the influences of all other factors that are not in the
model).
Probabilistic model: expected value of error term is 0, expected value of y= deterministic
component.
Least squares method tries to find a regression line, where the sum of squares of deviations
is the smallest (also called OLS regression).
Gauss-Markov theorem: In a linear regression model that satisfies OLS assumptions, the
least squares estimator is BLUE (best linear unbiased estimator).
- Best= parameters estimate with smallest amount of variance
- Unbiased: the expected value of an estimated parameter is equal to its population
value (if you take more samples, value of parameter comes closer to population
value
- An unbiased estimator with the smallest variance is called efficient.
- An estimator which with increasing sample size comes closer to its population value
is called “consistent”
OLS assumptions:
1. The regression model is linear, correctly specified and contains an error term
2. The error term has a zero population mean (expected value of error term=0
3. All independent variables are not correlated with the error term
4. Observations of the error term are uncorrelated with each other (no autocorrelation)
5. The error term has a constant variance (no heteroskedasticity)
6. No explanatory variable is the perfect linear function of another explanatory variable
(no collinearity)
7. The error term is normally distributed (required for statistical testing)
About assumption one, correctly specified means that:
- All variables that should be in the model are in the model (no omitted variables)
- The shape of the relationship is addressed well
If a relevant variable is not in the model, or it is not linear
- Expected value of error term may deviate from zero
, - There might be a correlation between an independent variable and the error term
- There might be correlations between observations of the error term
(autocorrelation).
Steps for simple regression:
- Form a hypothesis
- Make beta zero and beta 1 from sample values
- Evaluate the model
o Test for significance of coefficients (t-test)
o Compute the coefficient of determination (R2)
- Interpret your findings.
Testing if coefficients test for linear relationship between X and Y
H0: B1=0 (no relationship)
Ha: B1 is not 0 (relationship)
If the four assumptions about the error term are correct, beta one is normally distributed,
and a t-test can be used to check if beta one differs from zero.
Coefficient of determination (R2) shows how much of the variance of Y can be explained by
the relationship with X.
If R2 is larger, explanatory power of independent variables is higher.
Steps for a multiple regression:
The same as simple regression, except:
- You need to test for the significance of the model (F-test)
Hypothesis f test
H0: b1=b2=b3=0 (no relationships)
Ha: at least one beta is not zero (relationship)
Hoorcollege 2: dummy variables.
You have four types of variables:
- Nominal (religion)
- Ordinal (education)
- Interval (year)
- Ratio (age)
To get a variable at a level lower than interval into a regression, you have to use dummies.
If the variable is dichotomous (gender for example, male or female), add dummy gender to
the regression, which has value 0 for male and 1 for female.
Now you have the regression: income= B0 + B1*gender
B0 is the income for men!