STATISTICS ONE 2023
(101- Welcome to statistics)
Why statistics?
- Make sense of collected data
- Discover patterns and causal relationships
- Possibilities and limitations of sampling
- Critical reflection on existing research
- Become critical observers of news
https://levente.littvay.hu/chromebook/
(102- Variables and levels of measurement)
Variable
“Any characteristics, number, or quantity that can be measured and can differ across entities
or across time.’
Types of variables
- Variables have different scales or “levels of measurements”
- levels of measurements = nature of information of the values assigned to variables
Levels of measurement: categorical
nominal variables
● two or more exclusive categories
● no natural order
○ eye color
○ marital status
○ hair color
● no arithmetic operations possible (substraction, addition)
ordinal variables
● clear ordering of the values; low-high, little-much, small-large
○ education level (high school, college)
○ political interest (low, high, middle)
○ agreement to a statement (strongly disagree, agree, etc)
● distance between values not the same across levels
, ● comparative but only relative
Levels of measurement: numerical
Continuous variables
● can be measured to any level of precision
● in decimals or fractures
e.g. ‘height’ can take on any value (175,25252637 cm
● continuous variables can be ‘measured’ in discrete terms
discrete variables
● only certain, countable values (usually whole numbers) are possible
○ number of pets
○ points in exam
○ car accident
● are always discrete terms
Alternative levels of measurement
Two forms of ‘continuous variables:
1. interval; numerical variable but the zero is arbitrary/meaningless
zero is just another point at the scale but is not important (farenheit, kelvin)
2. ratio; like interval but meaningful zero (height, weight, salary)
(103- Measurement of central tendency)
When we collect date, we can show how the values are distributed in relation to other values
Frequency distribution = display of the pattern of frequencies of a variable of a statistical
data set
- show all the possible values (or intervals) of the data and how often (i.e. frequent)
they occur
,How can we summarize/describe distributions of variables
Option 1: Visualize date
Option 2: Calculate measures to summarize date
- Measure of central tendency = a value that describes a set of date by
identifying the central position within that set of date
- Measure of dispersion = how s t r e t c h e d or squeezed is the distribution?
(1) mode
➔ Most frequent score in a dataset (most big)
➔ Data with one mode is ‘unimodal’
➔ There can be several modes (bimodal)
(2) median
➔ Middle score for a dataset that has been arranged in order of magnitude
, ➔ What happens with an even number of scores?
➔ Also possible for ordinal variables
➔ Mean is sensitive to extreme values (outliers)
➔ If extreme values are in the data set the median may be more useful
➔ Median = ‘robust’ statistic
(101- Welcome to statistics)
Why statistics?
- Make sense of collected data
- Discover patterns and causal relationships
- Possibilities and limitations of sampling
- Critical reflection on existing research
- Become critical observers of news
https://levente.littvay.hu/chromebook/
(102- Variables and levels of measurement)
Variable
“Any characteristics, number, or quantity that can be measured and can differ across entities
or across time.’
Types of variables
- Variables have different scales or “levels of measurements”
- levels of measurements = nature of information of the values assigned to variables
Levels of measurement: categorical
nominal variables
● two or more exclusive categories
● no natural order
○ eye color
○ marital status
○ hair color
● no arithmetic operations possible (substraction, addition)
ordinal variables
● clear ordering of the values; low-high, little-much, small-large
○ education level (high school, college)
○ political interest (low, high, middle)
○ agreement to a statement (strongly disagree, agree, etc)
● distance between values not the same across levels
, ● comparative but only relative
Levels of measurement: numerical
Continuous variables
● can be measured to any level of precision
● in decimals or fractures
e.g. ‘height’ can take on any value (175,25252637 cm
● continuous variables can be ‘measured’ in discrete terms
discrete variables
● only certain, countable values (usually whole numbers) are possible
○ number of pets
○ points in exam
○ car accident
● are always discrete terms
Alternative levels of measurement
Two forms of ‘continuous variables:
1. interval; numerical variable but the zero is arbitrary/meaningless
zero is just another point at the scale but is not important (farenheit, kelvin)
2. ratio; like interval but meaningful zero (height, weight, salary)
(103- Measurement of central tendency)
When we collect date, we can show how the values are distributed in relation to other values
Frequency distribution = display of the pattern of frequencies of a variable of a statistical
data set
- show all the possible values (or intervals) of the data and how often (i.e. frequent)
they occur
,How can we summarize/describe distributions of variables
Option 1: Visualize date
Option 2: Calculate measures to summarize date
- Measure of central tendency = a value that describes a set of date by
identifying the central position within that set of date
- Measure of dispersion = how s t r e t c h e d or squeezed is the distribution?
(1) mode
➔ Most frequent score in a dataset (most big)
➔ Data with one mode is ‘unimodal’
➔ There can be several modes (bimodal)
(2) median
➔ Middle score for a dataset that has been arranged in order of magnitude
, ➔ What happens with an even number of scores?
➔ Also possible for ordinal variables
➔ Mean is sensitive to extreme values (outliers)
➔ If extreme values are in the data set the median may be more useful
➔ Median = ‘robust’ statistic