100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Applied Economics Summary

Rating
-
Sold
-
Pages
16
Uploaded on
19-05-2021
Written in
2018/2019

An intro to statistics and econometrics for economics students in their initial years.

Institution
Course










Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
May 19, 2021
Number of pages
16
Written in
2018/2019
Type
Summary

Subjects

Content preview

Introduction to Data, Economic Modelling and Econometrics

Economic models help us to understand economic phenomena and forecast changes. Models suggest a relationship. We use data and statistical tools to
test these relationships and their magnitude. Tools used in econometrics are statistical in nature.

What data do applied economists work with?
• Datasets are collections of realisations of random variables
- Datasets usually include several variables X, Y , Z ...
- We have value for these variables for each of the N observations in the dataset
- Each observation is a realisation of the random variable
- (X1, X2;..... XN ), (Y1, Y2, ....YN ), (Z1,Z2,...ZN )

• Data differ by their unit of observation (or level):
- Individual person, household, firm
- Aggregated at geographical areas, e.g. countries

Data used in applied economics
• Time series data
- Same unit observed at different points in time
- Good for investigating effects of variables which vary over time e.g. stock prices, inflation. Ordered chronologically
- Used in applications of macro-economic models e.g. UK economic growth between 1979 and 2012
• Cross sectional data
- Data from units observed in the same time. Doesn’t have to be exact e.g. different week of same year. Ordering doesn’t matter
- Good for investigating relationships between variables which vary between individuals at any point in time
- E.g. productivity and output of firms in the UK at one point in time, wages and education of workers, incomes
• Combination of cross-sectional and time series data
- If these are linked by observational unit, they are called panel or longitudinal data. Consists of a time series for each cross-sectional member in the
data
- Example: longitudinal surveys that follow the same cohort of individuals from childhood through old age (family environment, schooling, wages,
retirement etc.)
- If not linked, then it is repeated cross sections.e.g. 1985 survey of households for income, savings etc. 1990 new survey of different households
with same questions.
- Main difference is that in a cross section, the same units e.g individuals, firms are followed over a given time period.
- So for houses sold in 1993 and 1995, it is not panel as houses sold are different
- Good for investigating life-cycle phenomena and evaluating gov policy


Sources of data used in applied economics
• Most popular source of data used in economics is survey data
- Collected on a sample of the population of interest but samples not always representative of the population
- Typically rely on surveys collected by a third party (e.g. government) but becoming popular to collect primary data

• Economists also increasingly use administrative or register based data
- Data are on the entire population rather than a sample
- Administrative data are not collected for the purpose of research, but for statistical or accounting purposes
- Advantages: Very large number of observations, often very precise measurements
- Disadvantages: Small number of variables, often very confidential

• Empirical analysis uses data to test theory or estimate a relationship. We may construct a formal economic model which consists of math equations that
describe relationships. We then turn it into an econometric model by specifying the form of the function and how to deal with variables that are too hard to
observe. We then collect data on the variables, estimate parameters and then test hypothesis. Other times, we can create an informal model using
intuition.

• We try to determine whether a variable has a causal effect on another variable. But be careful with correlation and causation. We use ceteris paribus to
determine causal effect.

Economic Theoretical Model
• Production function: h is human capital, total labour input is hL, A is measure of productivity, α is capital’s share of income. 0<α<1

• Per-worker production function: output = productivity x factors of production
• We want to estimate the income ratio: ratio of output = ratio of productivity x ratio of factors of production
• How to measure y, k, h? Where to find the data? Empirical proxy for α? Larger the ratio, greater the income inequality.
• y: GDP per capita, k: physical capital per worker, h: number of years of schooling. Income difference due to productivity and factor differences.


• We transform y into equation relating growth rates so growth rate of output = growth rate of productivity + growth rate of factors of production
- where (^) means the growth rate of that variable
• We can rearrange this for A^ since y, k and h can be measured. This is called growth accounting. A^ is the Solow residual


• Correlation is only a statement of numerical facts; it says nothing about cause and effects.Correlation doesn’t equal causality.
• Consider a positive correlation between X and Y.
- X causes Y. X affects Y – causation is running from X to Y. e.g. rain and umbrella
- Causation can go either way (reverse causality). e.g. health and income
- There is no direct causal relationship between X and Y. But some third variable, Z, causes both X and Y. In this case, Z is called an omitted variable

e.g. Data description: readers of tabloids are more hostile to immigration. Data interpretation: we want to know the extent to which tabloid reading means
you are more hostile to immigration. We create a theory of the causal process A=f(N) i.e. attitudes to immigration are a function of newspaper readership.

Considerations
• Representativeness of the sample: has the sample been drawn in a way such that it doesn’t induce any unwanted correlation between the variables
• Quality of the data: do variables in the data match up to theoretical concepts in the model? Is the supposed explanatory variable imprecisely measured?
• Data generating process and Direction of Causality: could the same data have been generated by a totally different model? Is there reverse causality i.e.
attitudes determine newspaper so N=f(A). if so, we are misinterpreting the association
• Assume that other influences on attitudes are not associated with newspaper readership else we would be mistakenly attributing their influence to that of
newspaper readership.
• Sometimes these influences are observed but other times are unobserved i.e. are omitted variables e.g. education influences.
- Broadsheet readers better educated so less hostile.
• Confounding variables- explain the correlation between the variables observed
• Measurement error

,The sampling distribution of an estimator
• Our model is yi = α + βxi + εi. We take a repeated sample
• We choose some values for α and β. Keeping the values of the xi unchanged, we obtain new observations for dependent variable yi by drawing a new
set of disturbances εi .
• Repeat 2,000 times so we have 2,000 repeated samples. For each repeated samples, we could use an estimator β∗ to calculate an estimate of β
- Because the samples differ, these 2,000 estimates will not be the same. Distribution of these estimates is called the sampling distribution of β∗

Unbiasedness
• Estimator β∗ is said to be an unbiased estimator of β if the mean of its sampling distribution is equal to the true parameter β.
• So E(β∗) =β . This means if we could take infinite repeated sample, we would get β on average.
• Bias = E(β∗) − β

Efficiency
• May have more than one unbiased estimator. Unbiased estimator whose sampling distribution has smallest variance is called the best unbiased or
efficient estimator- we are more certain that the estimate is close to the true value of β
• If we have two estimators β∗ and β+ , β∗ is more efficient than β+ if Var(β∗) < Var(β+)

Consistency
• We want to know how an estimator behaves when the sample size gets very large .
• As the sample size changes, sampling distribution of most estimators changes
• Estimator is consistent if probability of any deviation from true value diminishes towards zero as sample size gets large

Assumptions of the OLS Estimator
• OLS estimator exhibits these properties under particular sets of assumptions( Gauss-Markov)
• properties of the OLS estimator depend on the properties of the error term εi and explanatory variables xi

• A1: Linear functional form
- Dependent variable can be calculated as a linear function of independent variables, plus disturbance term. I.e can write model as yi = α + βxi + εi
- Assumption violated when: wrong regressors, nonlinearity, changing parameters

• A2: Conditional mean-independence/ orthogonality/exogeneity assumption
- Average of εi is zero across all individuals in the population with any value of xi . So E(εi|xi) = 0 for all xi
- implies that E(εi) = 0 for all individuals and εi and xi are uncorrelated: regressor xi does not provide any information about the expected values of
the error terms
- A2 is reasonable when values of x are fixed but this is not possible in economics.
- Can fail because: Omitted influences on yi may be correlated with xi, simultaneous influence of yi on xi (reverse causality), measurement error in
xi . E.g. for wage equation, if we omit experience in the model, which is correlated with education and affects wages), it will appear in the error
term.

• A3: Homoskedasticity
- Disturbances all have the same variance V (εi) = σ2
- When A3 holds, we say the errors are homoskedastic. If fails, we say heteroskedastic
- A3 fails: Different groups in sample could have different variances, variation in unexplained yi might vary with xi
- E.g. variation in unexplained wages might increase with education

• A4: Zero Autocorrelation
- Disturbances not correlated with one another across observations i.e. no auto-correlation between disturbances.
- A4 fails: Observations from similar clusters (family), observations on same individual but different periods (serial correlation)

• A5: Normality of the disturbances
- ε ∼ N(0,σ2)
- A5 fails: disturbances have non-normal distributions


Properties of the OLS estimator
• P1: If A1 & A2 hold, then is unbiased
• P2: If A1 & A2 hold, then is consistent
- n.b P1 applied to a small sample, P2 applies to a large sample
• If A1-A4 holds then OLS estimator is the best linear unbiased estimator (BLUE) for β. i.e. linear estimator with the lowest variance around the true
value β
• If A1-A5 holds then, among all estimators, has the lowest variance around the true value β in large samples. This is a desirable large sample
property


What if the OLS assumptions are violated?
• A1/ A2 violation means is neither unbiased nor consistent
• A3/ A4 violation means is still unbiased, but not efficient
• Difficult to tell whether assumptions hold- we can say something about the direction(sign) of bias
• A1 violated then not much, A2 violated then depends on reason why violated.

• If A2 is violated because omitted positive influences on yi are correlated with xi
- If an omitted positive influence on yi is positively correlated with xi, then OLS will tend to mistakenly attribute the omitted influence to xi and is
overestimate of the true β.
- if an omitted positive influence is negatively correlated with xi, then is underestimate of true β

• If A2 is violated because of simultaneous influence of yi on xi
- If influence of yi on xi is positive, then OLS coefficient on xi will be upward
- If the influence of yi on xi is negative, then OLS coefficient on xi will be downward biased

• If A2 is violated because of measurement error in xi
- It can be shown to the coefficient on xi will be downward biased

, Regression Model and OLS Estimator

Wages
• Theory says more productive worker gets higher wage. We want to know if education raises wages and by how much.
• We have a linear equation: lnwi =α+βedui . We plot log hourly wage against years of education. We use regression
analysis to find an estimate for α & β . α = the log hourly wage of someone with no education

• We model ln wi instead of wi because a small change in the log wage approximates to a percent change in the wage.
- We can then interpret changes in this quantity as a percent change in the wage. So β can be interpreted as giving
the percent change in earnings resulting from a one-year increase in schooling

• When we plot the data, we see the relationship is not as exact as the deterministic linear relationship we hypothesised.
- There is a discrepancy between the model and the data because other factors determine wages as well
• We introduce a disturbance/ error term εi so we have lnwi =α+βedui +εi
- Now the relationship between wage and education is stochastic

• Why we have an error term
- Omitted factors influencing the dependent variable: e.g. experience affects wage
- Measurement error: e.g. when reporting wage
- Human indeterminacy: Inherent randomness in human behaviour e.g. luck


The general statistical model
• We write the model as yi = α + βxi + εi where yi: dependent variable & xi: independent/explanatory variable/regressor
• α and β are the unknown parameters which we seek to find. α=constant/intercept, β=coefficient of xi
• existence of εi whose magnitude is unknown makes calculating α & β impossible- we have to estimate them



Estimation
• Let α̃ and be possible estimates for α and β
• If we have α̃ & and the true x values, we can estimate values of the dependent variable y using:
• can be subtracted from actual values (y) of dependent variable in the data set to produce residuals ( )
- = yi- =
- residuals can be thought of as estimates of unknown disturbances inherent in data set i.e. of errors

Ordinary Least Squares (OLS) estimator
• A good estimator should generate a set of estimates of parameters that makes the residuals small
• The OLS estimator generates the set of values of the parameters (α̃ & ) that minimises the sum of squared residuals
• We denote the OLS estimates as (or βOLS)

• FOC for minimisation: (8) and (9). Called normal equations
• From (8) we get (10) where are sample means of x and y

• From (9), we get: and then sub to (10):
• We solve for the slope coefficient which is the ratio of the sample covariance of y and x to the sample
variance of x. The sign of depends on the sign of the correlation between y and x
- is a linear estimate: it is a weighted sum of the observations on yi If positive, then x and y positively correlated
• Given , is set such that
- Fitted relationship passes through the means of y and x and average residual as residuals sum to 0.
• So, find mean of x and y from sample, then find , then then sub into : to get an estimate for y

Estimation and estimators
• No one ever knows true value of β so we can estimate e.g. by OLS where is the parameter estimate
• Estimator is formula/ recipe by which data is transformed into an actual estimate
• We want to make residuals small but how to define small? Also, what should weights be for residuals?
- All residuals weighted equally: minimise sum of absolute values of these residuals
- If we think large residuals should be avoided, then use OLS
- If we think residuals above particular threshold should be avoided, then could place a zero weight on residuals
smaller than critical value

• OLS estimator is the most popular: has some desirable properties under certain assumptions & easily computed.
• Desirable properties are: unbiasedness, consistency & efficiency. Some properties apply to behaviour of the
estimator in small samples, other apply to behaviour in large samples.
$4.15
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
rileykrane6

Get to know the seller

Seller avatar
rileykrane6
Follow You need to be logged in order to follow users or courses
Sold
1
Member since
4 year
Number of followers
1
Documents
0
Last sold
4 year ago

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions