Resumen

Applied Economics Summary

Puntuación

Vendido

Páginas

Subido en

19-05-2021

Escrito en

2018/2019

An intro to statistics and econometrics for economics students in their initial years.

Institución

Grado

Ups! No podemos cargar tu documento ahora. Inténtalo de nuevo o contacta con soporte.

Informar violación de derechos de autor

Escuela, estudio y materia

Institución: University College London (UCL)
Estudio: University College London
Grado: Applied Economics

Todos documentos para esta materia (2)

Información del documento

Subido en: 19 de mayo de 2021
Número de páginas: 16
Escrito en: 2018/2019
Tipo: Resumen

Temas

stats
statistics
economics
simple
summary

Vista previa del contenido

Introduction to Data, Economic Modelling and Econometrics

Economic models help us to understand economic phenomena and forecast changes. Models suggest a relationship. We use data and statistical tools to
test these relationships and their magnitude. Tools used in econometrics are statistical in nature.

What data do applied economists work with?
• Datasets are collections of realisations of random variables
- Datasets usually include several variables X, Y , Z ...
- We have value for these variables for each of the N observations in the dataset
- Each observation is a realisation of the random variable
- (X1, X2;..... XN ), (Y1, Y2, ....YN ), (Z1,Z2,...ZN )

• Data differ by their unit of observation (or level):
- Individual person, household, firm
- Aggregated at geographical areas, e.g. countries

Data used in applied economics
• Time series data
- Same unit observed at different points in time
- Good for investigating effects of variables which vary over time e.g. stock prices, inflation. Ordered chronologically
- Used in applications of macro-economic models e.g. UK economic growth between 1979 and 2012
• Cross sectional data
- Data from units observed in the same time. Doesn’t have to be exact e.g. different week of same year. Ordering doesn’t matter
- Good for investigating relationships between variables which vary between individuals at any point in time
- E.g. productivity and output of firms in the UK at one point in time, wages and education of workers, incomes
• Combination of cross-sectional and time series data
- If these are linked by observational unit, they are called panel or longitudinal data. Consists of a time series for each cross-sectional member in the
data
- Example: longitudinal surveys that follow the same cohort of individuals from childhood through old age (family environment, schooling, wages,
retirement etc.)
- If not linked, then it is repeated cross sections.e.g. 1985 survey of households for income, savings etc. 1990 new survey of different households
with same questions.
- Main difference is that in a cross section, the same units e.g individuals, firms are followed over a given time period.
- So for houses sold in 1993 and 1995, it is not panel as houses sold are different
- Good for investigating life-cycle phenomena and evaluating gov policy

Sources of data used in applied economics
• Most popular source of data used in economics is survey data
- Collected on a sample of the population of interest but samples not always representative of the population
- Typically rely on surveys collected by a third party (e.g. government) but becoming popular to collect primary data

• Economists also increasingly use administrative or register based data
- Data are on the entire population rather than a sample
- Administrative data are not collected for the purpose of research, but for statistical or accounting purposes
- Advantages: Very large number of observations, often very precise measurements
- Disadvantages: Small number of variables, often very confidential

• Empirical analysis uses data to test theory or estimate a relationship. We may construct a formal economic model which consists of math equations that
describe relationships. We then turn it into an econometric model by specifying the form of the function and how to deal with variables that are too hard to
observe. We then collect data on the variables, estimate parameters and then test hypothesis. Other times, we can create an informal model using
intuition.

• We try to determine whether a variable has a causal effect on another variable. But be careful with correlation and causation. We use ceteris paribus to
determine causal effect.

Economic Theoretical Model
• Production function: h is human capital, total labour input is hL, A is measure of productivity, α is capital’s share of income. 0<α<1

• Per-worker production function: output = productivity x factors of production
• We want to estimate the income ratio: ratio of output = ratio of productivity x ratio of factors of production
• How to measure y, k, h? Where to find the data? Empirical proxy for α? Larger the ratio, greater the income inequality.
• y: GDP per capita, k: physical capital per worker, h: number of years of schooling. Income difference due to productivity and factor differences.

• We transform y into equation relating growth rates so growth rate of output = growth rate of productivity + growth rate of factors of production
- where (^) means the growth rate of that variable
• We can rearrange this for A^ since y, k and h can be measured. This is called growth accounting. A^ is the Solow residual

• Correlation is only a statement of numerical facts; it says nothing about cause and effects.Correlation doesn’t equal causality.
• Consider a positive correlation between X and Y.
- X causes Y. X affects Y – causation is running from X to Y. e.g. rain and umbrella
- Causation can go either way (reverse causality). e.g. health and income
- There is no direct causal relationship between X and Y. But some third variable, Z, causes both X and Y. In this case, Z is called an omitted variable

e.g. Data description: readers of tabloids are more hostile to immigration. Data interpretation: we want to know the extent to which tabloid reading means
you are more hostile to immigration. We create a theory of the causal process A=f(N) i.e. attitudes to immigration are a function of newspaper readership.

Considerations
• Representativeness of the sample: has the sample been drawn in a way such that it doesn’t induce any unwanted correlation between the variables
• Quality of the data: do variables in the data match up to theoretical concepts in the model? Is the supposed explanatory variable imprecisely measured?
• Data generating process and Direction of Causality: could the same data have been generated by a totally different model? Is there reverse causality i.e.
attitudes determine newspaper so N=f(A). if so, we are misinterpreting the association
• Assume that other influences on attitudes are not associated with newspaper readership else we would be mistakenly attributing their influence to that of
newspaper readership.
• Sometimes these influences are observed but other times are unobserved i.e. are omitted variables e.g. education influences.
- Broadsheet readers better educated so less hostile.
• Confounding variables- explain the correlation between the variables observed
• Measurement error

,The sampling distribution of an estimator
• Our model is yi = α + βxi + εi. We take a repeated sample
• We choose some values for α and β. Keeping the values of the xi unchanged, we obtain new observations for dependent variable yi by drawing a new
set of disturbances εi .
• Repeat 2,000 times so we have 2,000 repeated samples. For each repeated samples, we could use an estimator β∗ to calculate an estimate of β
- Because the samples differ, these 2,000 estimates will not be the same. Distribution of these estimates is called the sampling distribution of β∗

Unbiasedness
• Estimator β∗ is said to be an unbiased estimator of β if the mean of its sampling distribution is equal to the true parameter β.
• So E(β∗) =β . This means if we could take infinite repeated sample, we would get β on average.
• Bias = E(β∗) − β

Efficiency
• May have more than one unbiased estimator. Unbiased estimator whose sampling distribution has smallest variance is called the best unbiased or
efficient estimator- we are more certain that the estimate is close to the true value of β
• If we have two estimators β∗ and β+ , β∗ is more efficient than β+ if Var(β∗) < Var(β+)

Consistency
• We want to know how an estimator behaves when the sample size gets very large .
• As the sample size changes, sampling distribution of most estimators changes
• Estimator is consistent if probability of any deviation from true value diminishes towards zero as sample size gets large

Assumptions of the OLS Estimator
• OLS estimator exhibits these properties under particular sets of assumptions( Gauss-Markov)
• properties of the OLS estimator depend on the properties of the error term εi and explanatory variables xi

• A1: Linear functional form
- Dependent variable can be calculated as a linear function of independent variables, plus disturbance term. I.e can write model as yi = α + βxi + εi
- Assumption violated when: wrong regressors, nonlinearity, changing parameters

• A2: Conditional mean-independence/ orthogonality/exogeneity assumption
- Average of εi is zero across all individuals in the population with any value of xi . So E(εi|xi) = 0 for all xi
- implies that E(εi) = 0 for all individuals and εi and xi are uncorrelated: regressor xi does not provide any information about the expected values of
the error terms
- A2 is reasonable when values of x are fixed but this is not possible in economics.
- Can fail because: Omitted influences on yi may be correlated with xi, simultaneous influence of yi on xi (reverse causality), measurement error in
xi . E.g. for wage equation, if we omit experience in the model, which is correlated with education and affects wages), it will appear in the error
term.

• A3: Homoskedasticity
- Disturbances all have the same variance V (εi) = σ2
- When A3 holds, we say the errors are homoskedastic. If fails, we say heteroskedastic
- A3 fails: Different groups in sample could have different variances, variation in unexplained yi might vary with xi
- E.g. variation in unexplained wages might increase with education

• A4: Zero Autocorrelation
- Disturbances not correlated with one another across observations i.e. no auto-correlation between disturbances.
- A4 fails: Observations from similar clusters (family), observations on same individual but different periods (serial correlation)

• A5: Normality of the disturbances
- ε ∼ N(0,σ2)
- A5 fails: disturbances have non-normal distributions

Properties of the OLS estimator
• P1: If A1 & A2 hold, then is unbiased
• P2: If A1 & A2 hold, then is consistent
- n.b P1 applied to a small sample, P2 applies to a large sample
• If A1-A4 holds then OLS estimator is the best linear unbiased estimator (BLUE) for β. i.e. linear estimator with the lowest variance around the true
value β
• If A1-A5 holds then, among all estimators, has the lowest variance around the true value β in large samples. This is a desirable large sample
property

What if the OLS assumptions are violated?
• A1/ A2 violation means is neither unbiased nor consistent
• A3/ A4 violation means is still unbiased, but not efficient
• Difficult to tell whether assumptions hold- we can say something about the direction(sign) of bias
• A1 violated then not much, A2 violated then depends on reason why violated.

• If A2 is violated because omitted positive influences on yi are correlated with xi
- If an omitted positive influence on yi is positively correlated with xi, then OLS will tend to mistakenly attribute the omitted influence to xi and is
overestimate of the true β.
- if an omitted positive influence is negatively correlated with xi, then is underestimate of true β

• If A2 is violated because of simultaneous influence of yi on xi
- If influence of yi on xi is positive, then OLS coefficient on xi will be upward
- If the influence of yi on xi is negative, then OLS coefficient on xi will be downward biased

• If A2 is violated because of measurement error in xi
- It can be shown to the coefficient on xi will be downward biased

, Regression Model and OLS Estimator

Wages
• Theory says more productive worker gets higher wage. We want to know if education raises wages and by how much.
• We have a linear equation: lnwi =α+βedui . We plot log hourly wage against years of education. We use regression
analysis to find an estimate for α & β . α = the log hourly wage of someone with no education

• We model ln wi instead of wi because a small change in the log wage approximates to a percent change in the wage.
- We can then interpret changes in this quantity as a percent change in the wage. So β can be interpreted as giving
the percent change in earnings resulting from a one-year increase in schooling

• When we plot the data, we see the relationship is not as exact as the deterministic linear relationship we hypothesised.
- There is a discrepancy between the model and the data because other factors determine wages as well
• We introduce a disturbance/ error term εi so we have lnwi =α+βedui +εi
- Now the relationship between wage and education is stochastic

• Why we have an error term
- Omitted factors influencing the dependent variable: e.g. experience affects wage
- Measurement error: e.g. when reporting wage
- Human indeterminacy: Inherent randomness in human behaviour e.g. luck

The general statistical model
• We write the model as yi = α + βxi + εi where yi: dependent variable & xi: independent/explanatory variable/regressor
• α and β are the unknown parameters which we seek to find. α=constant/intercept, β=coefficient of xi
• existence of εi whose magnitude is unknown makes calculating α & β impossible- we have to estimate them

Estimation
• Let α̃ and be possible estimates for α and β
• If we have α̃ & and the true x values, we can estimate values of the dependent variable y using:
• can be subtracted from actual values (y) of dependent variable in the data set to produce residuals ( )
- = yi- =
- residuals can be thought of as estimates of unknown disturbances inherent in data set i.e. of errors

Ordinary Least Squares (OLS) estimator
• A good estimator should generate a set of estimates of parameters that makes the residuals small
• The OLS estimator generates the set of values of the parameters (α̃ & ) that minimises the sum of squared residuals
• We denote the OLS estimates as (or βOLS)

• FOC for minimisation: (8) and (9). Called normal equations
• From (8) we get (10) where are sample means of x and y

• From (9), we get: and then sub to (10):
• We solve for the slope coefficient which is the ratio of the sample covariance of y and x to the sample
variance of x. The sign of depends on the sign of the correlation between y and x
- is a linear estimate: it is a weighted sum of the observations on yi If positive, then x and y positively correlated
• Given , is set such that
- Fitted relationship passes through the means of y and x and average residual as residuals sum to 0.
• So, find mean of x and y from sample, then find , then then sub into : to get an estimate for y

Estimation and estimators
• No one ever knows true value of β so we can estimate e.g. by OLS where is the parameter estimate
• Estimator is formula/ recipe by which data is transformed into an actual estimate
• We want to make residuals small but how to define small? Also, what should weights be for residuals?
- All residuals weighted equally: minimise sum of absolute values of these residuals
- If we think large residuals should be avoided, then use OLS
- If we think residuals above particular threshold should be avoided, then could place a zero weight on residuals
smaller than critical value

• OLS estimator is the most popular: has some desirable properties under certain assumptions & easily computed.
• Desirable properties are: unbiasedness, consistency & efficiency. Some properties apply to behaviour of the
estimator in small samples, other apply to behaviour in large samples.

$4.15

Accede al documento completo:

100% de satisfacción garantizada

Inmediatamente disponible después del pago

Tanto en línea como en PDF

No estas atado a nada

Conoce al vendedor

rileykrane6

Conoce al vendedor

rileykrane6

Ver perfil

Seguir

Vendido

Miembro desde

4 año

Número de seguidores

Documentos

Última venta

4 año hace

0.0

0 reseñas

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

100% de satisfacción garantizada: ¿Cómo funciona?

Nuestra garantía de satisfacción le asegura que siempre encontrará un documento de estudio a tu medida. Tu rellenas un formulario y nuestro equipo de atención al cliente se encarga del resto.

Who am I buying this summary from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller rileykrane6. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy this summary for $4.15. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 45,681 summaries were sold in the last 30 days Founded in 2010, the go-to place to buy summaries for 15 years now