Lecture 1: Introduction and Descriptive statistics
Types of data
1. Cross-sectional data
2. Time series data
3. Panel data, a combination of 1 & 2
—
Cross-sectional data
Are data of individuals, rms, households, cities, states, countries, or other units of interest at a
single point of time/in a given period
They are more or less independent
Examples:
- a list of grades scored by a class of student on a
particular test
- A list of daily returns for a speci c date of stocks on
the New York Stock Exchange
- A sample of bond credit ratings for UK banks in
2021
Representative cross-sectional data are obtained by
random sampling from the underlying population
—
Time series data
Are observations of a variable or several variable over time
They are typically serially correlated
Examples:
Series
GDP or unemployment
Government budget de cit
Money Supply
Value of stock market index
Frequency
Monthly, or quarterly
Weekly
Annually
As transactions occur
1
fi fi
, Panel or longitudinal data
The same cross-sectional units are followed over time
Panel data have a cross-sectional and time
series dimension
- Used to account for time-invariant
unobservables
- Used to model lagged responses
—
Frequency distributions
A tabular display of data summarised into intervals is known as a frequency distribution.
Histograms
are the graphical representation of a frequency distribution.
Probability density function (pdf)
is a statistical expression that de nes a probability distribution (the likelihood of an outcome) for a
discrete random variable (e.g., a stock)
As sample becomes larger the histogram approaches the pdf
Descriptive Statistic
scribe, or summarize, data in ways that are meaningful and useful
Kurtosis
Distributions are referred to as being
1. Leptokurtic (+): positive excessive kurtosis
(more peaked than the normal; More
observations closer to the mean and out in the
tails; fatter tails)
2. Platykurtic (-): negative excessive kurtosis
(less peaked than the normal; more evenly
distributed across the range of possible
values; thinner tails)
3. Mesokurtic (0): zero excessive kurtosis
(equivalent to the normal distribution).
2
fi
,Covariance and correlation are both measures of the extent to which two random variables move
together
3
, Simple and log returns
Advantages of using log returns
- they can be interpreted as continuously compounded
returns
- Can add them up, e.g. if we want a weekly return and
we have calculated daily log returns:
Disadvantage:
4
Types of data
1. Cross-sectional data
2. Time series data
3. Panel data, a combination of 1 & 2
—
Cross-sectional data
Are data of individuals, rms, households, cities, states, countries, or other units of interest at a
single point of time/in a given period
They are more or less independent
Examples:
- a list of grades scored by a class of student on a
particular test
- A list of daily returns for a speci c date of stocks on
the New York Stock Exchange
- A sample of bond credit ratings for UK banks in
2021
Representative cross-sectional data are obtained by
random sampling from the underlying population
—
Time series data
Are observations of a variable or several variable over time
They are typically serially correlated
Examples:
Series
GDP or unemployment
Government budget de cit
Money Supply
Value of stock market index
Frequency
Monthly, or quarterly
Weekly
Annually
As transactions occur
1
fi fi
, Panel or longitudinal data
The same cross-sectional units are followed over time
Panel data have a cross-sectional and time
series dimension
- Used to account for time-invariant
unobservables
- Used to model lagged responses
—
Frequency distributions
A tabular display of data summarised into intervals is known as a frequency distribution.
Histograms
are the graphical representation of a frequency distribution.
Probability density function (pdf)
is a statistical expression that de nes a probability distribution (the likelihood of an outcome) for a
discrete random variable (e.g., a stock)
As sample becomes larger the histogram approaches the pdf
Descriptive Statistic
scribe, or summarize, data in ways that are meaningful and useful
Kurtosis
Distributions are referred to as being
1. Leptokurtic (+): positive excessive kurtosis
(more peaked than the normal; More
observations closer to the mean and out in the
tails; fatter tails)
2. Platykurtic (-): negative excessive kurtosis
(less peaked than the normal; more evenly
distributed across the range of possible
values; thinner tails)
3. Mesokurtic (0): zero excessive kurtosis
(equivalent to the normal distribution).
2
fi
,Covariance and correlation are both measures of the extent to which two random variables move
together
3
, Simple and log returns
Advantages of using log returns
- they can be interpreted as continuously compounded
returns
- Can add them up, e.g. if we want a weekly return and
we have calculated daily log returns:
Disadvantage:
4