Business Statistics
Lecture 1
Data are put into a Data Matrix or Data Frame.
Columns: variables (may have identifying name like “age”)
Rows: subjects/cases (may have identifying name like “John”)
Cells: observations
The population is the collection of all possible data points:
typically we do *not* have it !
A sample is a subset of data taken from the population.
We use this sample to infer something about the population:
• e.g., is there sufficient support for increasing expat subsidies under low income residents
A sample always has an aspect of randomness to it:
it could have been a different sample
Here the element of statistical analysis kicks in:
• we need a model to describe what could have been
• if it is at odds, we conclude the model (theory) is rejected by the data
This holds for completely random samples, stratified samples, etc.
Outliers are observations that show substantially dissimilar behavior from
the bulk of the data
, Outliers have substantial influence on many of our statistical procedures
Solutions:
• if there are outliers: either check whether your results change with versus without the
outliers and report and interpret this
• or replace the outliers by more reasonable values (e.g., some upper or lower
quantile, called censoring; or by an average value)
• or use outlier robust statistical procedures (such as non-parametric tests)
Skewness is a measure of asymmetry
• mainly used as a benchmark for normality or symmetry: Skew ≈ 0
Kurtosis is a measure of tail flatness/fatness:
mainly used as a benchmark for normality or symmetry: Kurt≈3
Correlation coefficient: Yx , y ≈ 1 Yx , y ≈−1
C C’
S P(C n S) P(C’ n S) P(S)
S’ P(C n S’) P(C’ n S’) P(S’)
P(C) P(C’) 1
Lecture 1
Data are put into a Data Matrix or Data Frame.
Columns: variables (may have identifying name like “age”)
Rows: subjects/cases (may have identifying name like “John”)
Cells: observations
The population is the collection of all possible data points:
typically we do *not* have it !
A sample is a subset of data taken from the population.
We use this sample to infer something about the population:
• e.g., is there sufficient support for increasing expat subsidies under low income residents
A sample always has an aspect of randomness to it:
it could have been a different sample
Here the element of statistical analysis kicks in:
• we need a model to describe what could have been
• if it is at odds, we conclude the model (theory) is rejected by the data
This holds for completely random samples, stratified samples, etc.
Outliers are observations that show substantially dissimilar behavior from
the bulk of the data
, Outliers have substantial influence on many of our statistical procedures
Solutions:
• if there are outliers: either check whether your results change with versus without the
outliers and report and interpret this
• or replace the outliers by more reasonable values (e.g., some upper or lower
quantile, called censoring; or by an average value)
• or use outlier robust statistical procedures (such as non-parametric tests)
Skewness is a measure of asymmetry
• mainly used as a benchmark for normality or symmetry: Skew ≈ 0
Kurtosis is a measure of tail flatness/fatness:
mainly used as a benchmark for normality or symmetry: Kurt≈3
Correlation coefficient: Yx , y ≈ 1 Yx , y ≈−1
C C’
S P(C n S) P(C’ n S) P(S)
S’ P(C n S’) P(C’ n S’) P(S’)
P(C) P(C’) 1