Descriptive statistics
Discrete vs. Continuous
● Discrete: finite number of outcomes
● Continious: a range of possible values. Infinite number of possible values
between any two and points. Uniform, normal, exponential, other
Properties of distributions:
● Variability: range, standard deviation and variance
● Central tendency: mode, median, mean
● Model classes: unimodal, bimodal
● Shape: symmetric and skewed
The normal distribution
● Bell shaped and symmetrical
● Mean, median and mode are equal
● Has infinite till infinity range
● An increasing mean shifts the curve to the right
● Decreasing mean shifts curve to the left
● Wider values of standard deviation widens the curve and smaller ones narrow it
● Normal density function: the probability density function of a normal random variable
Statistical inference
● Parameter: descriptive measure of a population
● Statistic: descriptive measure of a sample
● CI 95%: 95% chance outcome is right
● Significance level 5%: 5% that the conclusion is wrong
Describing a set of nominal data
● Univariate: technique applied to single sets of data
● Bivariate: depict relationship between variables (cross classification table)
Graphical descriptive techniques
Graphical techniques to describe interval data
● Histogram: helps explain important aspects probability (classes=1+3.3 log(n))
, Shapes histogram:
- Symmetry: two sides identical
- Skewness: positively = aflopend, negatively = oplopend
- Unimodel: single peak
- Bimodel: two peaks, not necessarily equal in height
- Bell shaped: special type of symmetric unimodel histogram (empirical rule)
● Stem-and-leaf display
● Relative frequency distribution: dividing frequencies by number of observations (%)
Graphical techniques to describe time-series data
● Cross-sectional data
● Line chart
If the difference between two categories is positive, the result is a surplus, if the difference is
negative, the result is a deficit.
Graphical techniques to describe the relationship between to interval variables
● Scatter diagram: relationship between two interval variables
Dependent y variable and independent x variable
→ measures of linear relationships
Positive linear relationship: if one variable increases when other does
Negative linear relationship: variables tend to move in opposite directions
Numerical descriptive techniques
Measures of central location
● Mean
● Median (better when: there is a small number of extreme observations and to measure
how well you performed relative to the class)
● Mode
Measures of variability (interval data)
● Range largest observation - smallest observation
● Variance how far the data is spread out
- Population variance
- Sample variance s2