Chapter 2 – Review of Probability
A set is any collection of real numbers. An interval is the set of all real numbers between two fixes
endpoints. Closed interval includes both endpoints: [ a , b ]. Open interval excludes both endpoints:
( a , b ). The probability of an outcome is the proportion of the time that the outcome occurs in the
long run. A random variable is any variable whose value cannot be predicted exactly. A discrete
random variable has a finite number of possible outcomes. A continuous random variable has
infinite many possible outcomes. The population is the set of all possible values of the random
variable. If we observe the entire population, we can calculate the outcomes of interest. A sample is
the subset of the population we actually observe. Which particular sample we draw is random. If we
observe a sample, we need to estimate the outcomes of interest. The expected value of a random
variable Y , is the long-run average of the random variable. The sample average is:
n
1
Y= ∑Y
n i=1 i
The variance of Y is a measure of dispersion and generally given by:
Var ( Y )=σ Y =E [ ( Y −μ Y ) ]
2 2
The sample variance is:
n
1
σ^ 2Y = ∑
n−1 i=1
( Y i −Y )
2
The skewness is a measure of the
symmetry of a distribution. The
kurtosis measures how heavy the
tails are of a distribution.
Given two discrete random
variables X and Y . Their joint probability distribution gives the probability that the random
variables simultaneously take on certain values. If knowing the value of Y provides no information
about the value of X , we say that X and Y are independent. Given two random variables X and Y ,
their covariance is defined as:
Cov ( X , Y )=E [ ( X−μ X ) ( Y −μ Y ) ]
The sample covariance is:
n
1
σ XY = ∑ ( X −X ) ( Y i−Y )
n−1 i=1 i
The correlation coefficient is defined as:
Cov ( x , Y ) σ XY
ρX Y= =
√ Var ( X ) Var ( Y ) σ X σ Y
The conditional expectation of Y given X is the expected value of Y given a specific value of X :
E( E (Y ∨ X))=E (Y )
Chapter 4 – Linear Regression with One Regressor
There are two types of questions we can answer:
a. Prediction: using observed values of X to predict Y . For prediction we distinguish two types:
a. In-Sample: Prediction for those observations for which we observe all variables.
b. Out-of-Sample: Prediction for those observations for which we do not observe the
outcome. Observation outside our sample.
b. Causal inference: do changes in X cause changes in Y .
1
, Correlation does not imply causation. For predictions, correlations are enough. For causation, we
need more. To understand how two variables behave together, we want to relate:
a. The outcome variable, Y , also known as the dependent variable, what we want to explain.
b. The input variable X , known as the independent variable / explanatory / regressor, what
we use to explain the outcomes.
Univariate Linear Regression Model:
We start by modelling the relationship between X and Y linearly:
Y i=β 0 + β 1 X i+ ui
( LRM . U )
Univariate meaning one independent variable. The model has a disturbance term, ui , the measure
of ignorance. The intercept, is β 0, and β 1, the slope coefficient, is the marginal effect. We want to
explain the variation in Y with the variation in X . How do X and Y behave together? We donate
estimates of parameters by a hat:
^ 0 + ^β 1 X i+ u^ i=Y^ i + u^ i
Y i=β 0 + β 1 X i+ ui= β⏟
¿ Y^ i
R2 is the fraction of sample variance in Y explained by X . The R2 always goes up when adding more
variables, unless the coefficient is exactly zero.
n
a. Total Sum of Squares: TSS=∑ ( Y i−Y )
2
i=1
n
b. Explained Sum of Squares: ESS=∑ ( Y
^ −Y ) 2
i
i=1
n
c. Sum of Squared Residuals: SSR=∑ u^ i
2
i=1
TSS=ESS+ SSR, and that can be used to calculate R2:
SSR
R2=1− ∈ [ 0,1 ]
TSS
n
Maximising R2 is equivalent to minimising SSR . Choosing ^β 0and ^β 1 to minimise SSR=∑ u^ i .
2
i=1
Minimising the sum of squared residuals is called Ordinary Least Squares (OLS). In the univariate
linear regression model, is the intercept of the OLS estimate:
^β 0=Y − ^β 1 X
The OLS estimate of the slope is sample covariance of X and Y divided by the sample variance of X :
σ
^β = XY
1 2
σX
There are four OLS properties:
1. The sample average of the residual is zero:
n
1
∑ u^ =0
n i=1 i
^ equals the sample average of the dependent
2. The sample average of the predicted values Y
variable Y :
n
1
Y= ∑ Yi
n i=1
2
A set is any collection of real numbers. An interval is the set of all real numbers between two fixes
endpoints. Closed interval includes both endpoints: [ a , b ]. Open interval excludes both endpoints:
( a , b ). The probability of an outcome is the proportion of the time that the outcome occurs in the
long run. A random variable is any variable whose value cannot be predicted exactly. A discrete
random variable has a finite number of possible outcomes. A continuous random variable has
infinite many possible outcomes. The population is the set of all possible values of the random
variable. If we observe the entire population, we can calculate the outcomes of interest. A sample is
the subset of the population we actually observe. Which particular sample we draw is random. If we
observe a sample, we need to estimate the outcomes of interest. The expected value of a random
variable Y , is the long-run average of the random variable. The sample average is:
n
1
Y= ∑Y
n i=1 i
The variance of Y is a measure of dispersion and generally given by:
Var ( Y )=σ Y =E [ ( Y −μ Y ) ]
2 2
The sample variance is:
n
1
σ^ 2Y = ∑
n−1 i=1
( Y i −Y )
2
The skewness is a measure of the
symmetry of a distribution. The
kurtosis measures how heavy the
tails are of a distribution.
Given two discrete random
variables X and Y . Their joint probability distribution gives the probability that the random
variables simultaneously take on certain values. If knowing the value of Y provides no information
about the value of X , we say that X and Y are independent. Given two random variables X and Y ,
their covariance is defined as:
Cov ( X , Y )=E [ ( X−μ X ) ( Y −μ Y ) ]
The sample covariance is:
n
1
σ XY = ∑ ( X −X ) ( Y i−Y )
n−1 i=1 i
The correlation coefficient is defined as:
Cov ( x , Y ) σ XY
ρX Y= =
√ Var ( X ) Var ( Y ) σ X σ Y
The conditional expectation of Y given X is the expected value of Y given a specific value of X :
E( E (Y ∨ X))=E (Y )
Chapter 4 – Linear Regression with One Regressor
There are two types of questions we can answer:
a. Prediction: using observed values of X to predict Y . For prediction we distinguish two types:
a. In-Sample: Prediction for those observations for which we observe all variables.
b. Out-of-Sample: Prediction for those observations for which we do not observe the
outcome. Observation outside our sample.
b. Causal inference: do changes in X cause changes in Y .
1
, Correlation does not imply causation. For predictions, correlations are enough. For causation, we
need more. To understand how two variables behave together, we want to relate:
a. The outcome variable, Y , also known as the dependent variable, what we want to explain.
b. The input variable X , known as the independent variable / explanatory / regressor, what
we use to explain the outcomes.
Univariate Linear Regression Model:
We start by modelling the relationship between X and Y linearly:
Y i=β 0 + β 1 X i+ ui
( LRM . U )
Univariate meaning one independent variable. The model has a disturbance term, ui , the measure
of ignorance. The intercept, is β 0, and β 1, the slope coefficient, is the marginal effect. We want to
explain the variation in Y with the variation in X . How do X and Y behave together? We donate
estimates of parameters by a hat:
^ 0 + ^β 1 X i+ u^ i=Y^ i + u^ i
Y i=β 0 + β 1 X i+ ui= β⏟
¿ Y^ i
R2 is the fraction of sample variance in Y explained by X . The R2 always goes up when adding more
variables, unless the coefficient is exactly zero.
n
a. Total Sum of Squares: TSS=∑ ( Y i−Y )
2
i=1
n
b. Explained Sum of Squares: ESS=∑ ( Y
^ −Y ) 2
i
i=1
n
c. Sum of Squared Residuals: SSR=∑ u^ i
2
i=1
TSS=ESS+ SSR, and that can be used to calculate R2:
SSR
R2=1− ∈ [ 0,1 ]
TSS
n
Maximising R2 is equivalent to minimising SSR . Choosing ^β 0and ^β 1 to minimise SSR=∑ u^ i .
2
i=1
Minimising the sum of squared residuals is called Ordinary Least Squares (OLS). In the univariate
linear regression model, is the intercept of the OLS estimate:
^β 0=Y − ^β 1 X
The OLS estimate of the slope is sample covariance of X and Y divided by the sample variance of X :
σ
^β = XY
1 2
σX
There are four OLS properties:
1. The sample average of the residual is zero:
n
1
∑ u^ =0
n i=1 i
^ equals the sample average of the dependent
2. The sample average of the predicted values Y
variable Y :
n
1
Y= ∑ Yi
n i=1
2