• One or several independent variables influence the dependent variable, which is the
variable that you actually want to measure.
Chapter 1 – Overview of regression analysis
What is econometrics
Econometrics (economic measurement) – Quantitative measurement and analysis of actual
economic and business phenomena. Hereby attempting to quantify economic reality (through
examining data, quantify firm/consumer/government action) and bridge the gap between
abstract economic theory and real human activity.
Three major uses:
• Describe economic reality;
o Making a formula a lot more explicit based upon (i.e. for consumer demand for
a specific product) past consumption, income and prices.
▪ Q = βo + β1P + β2PS + β3Yd
▪ Q = 27.7 – 0.11P + 0.03PS + 0.23Yd
• The beta coefficients are called an estimated regression
coefficient.
• Test hypotheses about economic theory and policy;
o Evaluating alternative theories with quantitative evidence.
o Testing if the formula above is really a normal good (higher price, lower Q;
higher PS, higher Q; higher disposable income, higher Q) can be done by
statistically testing the estimating coefficients.
▪ This should not just be positive/negative, as it should also be
statistically significant.
• Forecast future economic activity.
o Accuracy depends on the degree to which the past is a good guide for the
future.
Econometrics is considered observational or nonexperimental quantitative research,
where the following approaches are used:
• Specifying the models/relationships to be studied;
o Called the art of econometrics -> Theory-based skill.
• Collecting data needed to quantify the models;
• Quantifying the models with the data.
o There are many ways to quantify models, but the usual done in this book is
single-equation linear regression analysis.
Critical evaluation in a specific approach happens a lot, so people should look at it critically:
• Missing/inaccurate data;
• Incorrectly formulated relationships;
• Poorly chosen estimating techniques;
• Improper statistical testing procedures.
Econometrics is there to predict the amount of the direction, rather than the direction of
changes itself
• Higher prices -> Lower demand
, o Knowledge of economic theory and characteristics of product itself
• How much less demand following the higher prices
o Econometrics -> Regression analysis
What is regression analysis
Regression analysis – Explain movements in one variable (DV) as a function of movements
in a set of other variables (IV/EV) through the quantification of one or more equations. It is
used because most economic propositions can be stated in such equations:
• Q = β0 + β1P + β2PS + β1Yd
o Q = dependent variable
o P, PS, and Yd are independent/explanatory variables
Regression analysis and its results can only prove significance (= the strength and direction
of the relationships involved), not causality (if cause and effect would also be actually
related).
Single-equation linear model – A model because it has only one specified equation
• Y = β0 + β1X
o β0 = Constant/intercept -> Value of Y when X = 0.
o Β1 = Slope coefficient -> Amount that Y will change when X increases by 1.
(𝑌2−𝑌1) 𝛥𝑌 𝑟𝑖𝑠𝑒
▪ Also: (𝑋2−𝑋1) = 𝛥𝑋 = 𝑟𝑢𝑛
• Linear because the result gives a straight line (rather than a curve).
Stochastic error (disturbance) term – Variation from (1) omitted explanatory variables
(X2/X3), (2) omitted influences, (3) measurement error, (4) incorrect functional form and (5)
purely random/unpredictable outcomes. A term added to a regression equation to introduced
all variation in Y that can’t be explained by the included X-s.
• Econometrician’s ignorance or inability to model all movements of the DV.
• Symbol = ε
• Y = β0 + β1X + ε
o Part 1 = deterministic component
▪ Also: E(Y/X) = β0 + β1X – The expected value of Y given X -> The
mean value of the Ys associated with a particular value of X.
• If all 13-year old girls are 5’’, then 5 feet is the expected value
of a girl’s height given her age being 13.
o Part 2 = stochastic/random component
▪ The error is added in the equation above because not all 13-year olds
are also 5 feet tall: E(Y/X) = β0 + β1X + ε
, 1+2. Omitted (weggelaten) explanatory variables because they may be unavailable.
• I.e. uncertainty over the future course of the economy.
o Error term because it’s hard to measure consumer uncertainty.
3. Measurement error present in the DV.
• I.e. Sampling error (measuring sample rather than whole population) giving
different results.
4. Underlying theoretical equation has a different functional form/shape than the one
chosen for the regression.
• I.e. nonlinear function when you measured a linear consumption.
o
5. All attempts to generalise human behaviour must contain at least some amount of
unpredictable/purely random variation.
• A random event might occur that can’t be anticipated and may not ever be
repeated.
These explain the difference between the observed values [Y] and expected values from the
deterministic component [E(Y/X)].
There can also be more independent variables and they all account for a specific year, where
1 means the first year. In more general terms N or i is being used. This leads to a multivariate
regression model.
• Yi = β0 + β1X1i + β2X2i + β3X3i + ε
o Regression coefficient β1 here measures the impact of a one-unit increase in
X1 holding constant X2 and X3.
▪ It is very difficult to run controlled economic experiments, because
many economic factors change simultaneously and may influence
each other -> Solution is regression models and econometrics.
▪ If a variable is not mentioned in the model (X1, 2, 3 etc.), then it is
considered error and not held constant when measuring X1 (so the
error term may influence the results).
o The i-s here mean the different sample persons/units -> observation number.
▪ Different people have different Y and X values, but also different β
values, as people are influenced differently by random events.
▪ With time series (sample consisting of years/months), the i is replaced
by a t to denote time.
, Dummy variable – A variable that can only take on two values, i.e. gender.
The estimated regression equation
Estimated regression equation – Quantified version of the theoretical regression equation,
obtained from a sample of data for actual Xs and Ys.
Yi = β0 + β1Xi + εi becomes Ŷi = 103.40 + 6.38Xi (Ŷi = β̂0 + β̂1Xi)
• Y-hat = the estimated/fitted value of Y.
• 103.40 and 6.38 are estimates calculated from data, which will be compared with the
real values of X and Y.
o The beta-hats are estimated regression coefficients, obtained from data as
a sample of the Y-s and X-s.
• The closer Ŷ is to Y, the better the equation ‘fits’.
o Residual – The difference between Ŷ and Y -> The difference between the
observed Y and the estimated regression line (Ŷ).
▪ ei = Yi - Ŷi
Residual is not the same as error, as error was the difference between the observed Y and
the true regression equation (expected value of Y -> E(Y/X).
• Residual can be considered as an estimate of the error term, yet can (unlike error
term) actually be measured in the real-world, while the error term is simply a
theoretical concept.
Example of regression analysis → Weight-guessing
• Summer job → Weight guesser with customers paying $2 each:
o You have to guess weight within 10 pounds:
▪ If you miss more than 10 pounds → Return $2 + small price worth $3
▪ If you guess within 10 pounds → Keep $2
o There are mark on the wall so you can guess the person’s height.
o Apart from this, you only can deduct the person’s gender as information.
• Make a sample of males → Relationship between weight (DV) and height: