By: Thomas Konings
Table of Contents
Part 1: Introduction and Data ................................................................................................................. 2
Part 2: Event Studies ............................................................................................................................... 3
Workshop 1 ............................................................................................................................................. 4
Part 3: Basic Regression Analysis ............................................................................................................ 5
Part 4: Advanced Regression Analysis: Interaction Terms & Panel Data................................................ 7
Part 5: Advanced Regression Analysis: Fama-MacBeth & Binary Choice ............................................... 8
Workshop 2 ............................................................................................................................................. 8
Appendix – What data issues look like ................................................................................................... 9
Heteroscedasticity .............................................................................................................................. 9
Autocorrelation ................................................................................................................................... 9
Multicollinearity .................................................................................................................................. 9
© Thomas Konings – 2021 All rights reserved. Reproduction and distribution are prohibited.
Sold through Stuvia 1
, Part 1: Introduction and Data
Cross-sectional: comparing multiple units at a given point in time
Time series: track on unit over time Panel: track multiple units over multiple time periods
Checking data: use summary statistics & graphs to assess data quality (mean, median, std.dev., min,
max, percentiles etc. + histograms). Compare summary stats to other studies with same data.
Check data issues: things that are just illogical (trade data on non-trading days), jumps etc.
Outliers: [note: in general, do not remove actual data, even if it’s an extreme value]
Transformation: can do log to pull towards mean, Winsorizing: replace extreme values with cutoff
values Truncating: delete extreme observations. Then: recalculate descriptive statistics
Missing data: unbalanced does not mean unusable. Do not interpolate data (introduces bias), do not
replace values by zero (unless missing is a proxy for zero). Do you need the control variable if it
leaves you with only 10% of your data?
𝑝𝑡 −𝑝(𝑡−1)
Prices: denoted p, with suffix t for the time period. Returns: 𝑹𝒕 = 𝑝𝑡−1
Note: returns over two periods is obtained by multiplying, returns are not additive
Log returns: continuously compounded returns, 𝑟𝑡 = ln(𝑝𝑡 ) − ln(𝑝𝑡−1 ), these are additive across
periods. Difference is small for small returns (i.e. daily) → also works for adding portfolio weights
𝑝𝑡 +𝑑𝑡 −𝑝𝑡−1
Incorporating dividends: 𝑅𝑡 = (dt being dividend, paid before date t price)
𝑝𝑡−1
→ Dividend return: 𝑑𝑡 /𝑝𝑡−1 Excess returns: in excess of risk-free rate, or in excess of some other
portfolio (payoff of an arbitrage portfolio)
Returns are random, are not known in advance, they have a distribution, characterized by mean,
variance and higher moments (skewness/kurtosis), which are not observed only estimated.
Theories are about expected rather than average returns. In Finance, volatility = σ = sqrt(variance)
Portfolios can be equally-weighted (1/N), value-weighted (by market cap) or price-weighted (equal
number of stocks for each firm [Dow-Jones]). Scholars: portfolio sorts, then look at 10% (decile)
portfolios (e.g. 10% largest), can be equally-weighted or value-weighted.
Data organization: depends on type of data and analysis. Time series: variables in columns, dates in
rows. Cross-sectional: variables in columns, firms in rows. Panel: variables in columns, dates & firm
identifiers in rows. Note: keep firm identifiers and names, better for output (readability)
STATA Summary
Creates consistent time indicator: gen long month = month(date)
format month %tm
Optionally: xtset month (sets as time variable, allows lags)
Concert panel date long/wide: reshape
Note: type help [command] in Stata for docs
Merge: Single firm/month match merge 1:1 firm month using “dataset2.dta”
or Per month for all firms merge m:1 month using “dataset3.dta”
Append more data (rows): append using “additionalrows.dta”
If-statements: can slap these on the end of most commands, e.g. regress y x if x > 4
© Thomas Konings – 2021 All rights reserved. Reproduction and distribution are prohibited.
Sold through Stuvia 2