“Not all theory can be written down easily, some of it is just logic” – Mert 2024
Important things/theory in part 1:
• Categorical variables: values are categories, Also known as qualitative variables
• Ordinal scale: Categorical values have a natural ordening
• Nominal scale: Categorical values are unordered
• Metric variables: numerical values, Also known as quantitative variables
• Ratio scale:
• Can be ordered
• Specific numerical distance or interval between values
• Has a meaningful or true zero point
• Values can be added and subtracted
• Interval scale:
• Values can be ordered
• Specific numerical distance or interval between values
• Values can be added or subtracted
• No meaningful or true zero point!
Specific
numerical Can add
Natural distance or Can be True
order in between subtract multiplie zero
values values values d point
Categorical
1. Nominal
2. Ordinal X
Metric
3. Interval X X X
4. Ratio X X X X X
• All categorical variables are discrete
• Metric variables could be either discrete or continuous
,Measures of centre;
• Mean with observations
• Properties of the mean
– Only for metric variables 1
!
– Very sensitive to outliers !" =
%
& !!
"#$
• Mean with absolute frequencies
%
1
!" = & '! ×!!
%
"#$
• Properties of the median
– For metric and ordinal variables (not for nominal variables)
– Not sensitive to outliers • Calculating the median from frequency tables
– (Cumulative) absolute frequencies
M = value of (n+1)/2 th observation
– (Cumulative) relative frequencies
M = value of p=0,50 observation
Could be less precise because of rounding
• Properties of the mode
– For metric and categorical variables (all possible variables)
– Less informative than mean or median
• Mode is the value that occurs most frequently = the value with the highest absolute or
relative frequency
Measures of variability;
• Range
– Range is the difference between the largest and smallest values
– For metric variables (not for categorical variables)
• Variance & Standard Deviation
– Variance: mean squared distances from the sample mean y̅
– For metric variables (not for categorical variables)
– Statistical notation of the variance: S²
– Variance: based on the squared deviations
– Units of measurement: the squares of those of the original data
– Difficult to interpret
– Square root of the variance = standard deviation
– Statistical notation of standard deviation: S
# #
!
1 1
! = &((" − ())! !! = &((" − ())! ×,"
$−1 $−1
"$% "$%
# #
1 1
!= &((! − ())" != &((! − ())" ×,!
$−1 $−1
!$% !$%
, • Variation coefficient
– The ratio of the standard deviation S to the mean !" = Relative standard
deviation
– Statistical notation of the variation coefficient: V
– Only for metric variables (not for categorical)
– Often expressed as an percentage
– Used to compare the variability between groups or variables
$
!=
%&
Measures of position;
• Percentiles
– The pth percentile is the point such that p% of the observations fall below or
at that point and (100-p)% fall above it
– For metric and ordinal variables (not for nominal variables)
– Important percentiles
– Median = 50% percentile (p=50) = Q2
– Lower quartile = 25% percentile (p=25) = Q1
– Upper quartile = 75% percentile (p=75) = Q3
• Interquartile Range (IQR)
– difference between the upper and lower quartiles
– For metric variables (not for categorical variables)
– Measures the variability of the middle half of the observations
– The larger the IQR, the greater the variability
– Also used to detect outliers (supra)
• Box plot:
– graphical summary based on five numbers, which shows both the center and
variability of the observations
– 100% percentile = maximum (except for outliers)
– 75% percentile = upper quartile = Q3
– 50% percentile = median = Q2
– 25% percentile = lower quartile Q1
– 0% percentile = minimum = (except for outliers)
• Outliers:
– Observations which fall more than 1.5 IQR above the upper quartile and more
than 1.5 IQR below the lower quartile
– They are separately marked in box plots
Important things/theory in part 1:
• Categorical variables: values are categories, Also known as qualitative variables
• Ordinal scale: Categorical values have a natural ordening
• Nominal scale: Categorical values are unordered
• Metric variables: numerical values, Also known as quantitative variables
• Ratio scale:
• Can be ordered
• Specific numerical distance or interval between values
• Has a meaningful or true zero point
• Values can be added and subtracted
• Interval scale:
• Values can be ordered
• Specific numerical distance or interval between values
• Values can be added or subtracted
• No meaningful or true zero point!
Specific
numerical Can add
Natural distance or Can be True
order in between subtract multiplie zero
values values values d point
Categorical
1. Nominal
2. Ordinal X
Metric
3. Interval X X X
4. Ratio X X X X X
• All categorical variables are discrete
• Metric variables could be either discrete or continuous
,Measures of centre;
• Mean with observations
• Properties of the mean
– Only for metric variables 1
!
– Very sensitive to outliers !" =
%
& !!
"#$
• Mean with absolute frequencies
%
1
!" = & '! ×!!
%
"#$
• Properties of the median
– For metric and ordinal variables (not for nominal variables)
– Not sensitive to outliers • Calculating the median from frequency tables
– (Cumulative) absolute frequencies
M = value of (n+1)/2 th observation
– (Cumulative) relative frequencies
M = value of p=0,50 observation
Could be less precise because of rounding
• Properties of the mode
– For metric and categorical variables (all possible variables)
– Less informative than mean or median
• Mode is the value that occurs most frequently = the value with the highest absolute or
relative frequency
Measures of variability;
• Range
– Range is the difference between the largest and smallest values
– For metric variables (not for categorical variables)
• Variance & Standard Deviation
– Variance: mean squared distances from the sample mean y̅
– For metric variables (not for categorical variables)
– Statistical notation of the variance: S²
– Variance: based on the squared deviations
– Units of measurement: the squares of those of the original data
– Difficult to interpret
– Square root of the variance = standard deviation
– Statistical notation of standard deviation: S
# #
!
1 1
! = &((" − ())! !! = &((" − ())! ×,"
$−1 $−1
"$% "$%
# #
1 1
!= &((! − ())" != &((! − ())" ×,!
$−1 $−1
!$% !$%
, • Variation coefficient
– The ratio of the standard deviation S to the mean !" = Relative standard
deviation
– Statistical notation of the variation coefficient: V
– Only for metric variables (not for categorical)
– Often expressed as an percentage
– Used to compare the variability between groups or variables
$
!=
%&
Measures of position;
• Percentiles
– The pth percentile is the point such that p% of the observations fall below or
at that point and (100-p)% fall above it
– For metric and ordinal variables (not for nominal variables)
– Important percentiles
– Median = 50% percentile (p=50) = Q2
– Lower quartile = 25% percentile (p=25) = Q1
– Upper quartile = 75% percentile (p=75) = Q3
• Interquartile Range (IQR)
– difference between the upper and lower quartiles
– For metric variables (not for categorical variables)
– Measures the variability of the middle half of the observations
– The larger the IQR, the greater the variability
– Also used to detect outliers (supra)
• Box plot:
– graphical summary based on five numbers, which shows both the center and
variability of the observations
– 100% percentile = maximum (except for outliers)
– 75% percentile = upper quartile = Q3
– 50% percentile = median = Q2
– 25% percentile = lower quartile Q1
– 0% percentile = minimum = (except for outliers)
• Outliers:
– Observations which fall more than 1.5 IQR above the upper quartile and more
than 1.5 IQR below the lower quartile
– They are separately marked in box plots