Lecture 1:
Measures of central tendency and spread
Distributions, means & deviation
➔ Introduction:
→ Why statistics???
- Data is everywhere
- Needed for political analysis and inferences
- Makes sense of collected data
- Discover patterns and casual relationships
- Become critical
➔ Variable:
→ Anything that can be measured and can differ across entities or across time
- If doesn’t vary it constant so its not a variable by definition
- E.g: hair color/ level of trust / Age
➔ Crucial distinction in variables:
Independent variable (X):
, • Refers to the causes (has effect on the dependent variable)
Dependent variable (Y):
• Outcome
➔ Variables have different scales or different levels of measurement
→ Levels of measurement: The nature of information with in values assigned to variables
→ We can divide them into two continuous and two categorical
➔ Categorical variables:
1. Nominal
- Two or more exclusive categories
- No natural order
- No arithmetic operations possible ( subtract or equal greater)
- Political party/ favorite football club/ hair color
- Frequency: mode
2. Ordinal
- Clear ordering of the values ( lower or higher)
- E.g: Level of agreement – level of education – political interest- trust in government
- Spacing between values not the same across levels ( so you can not say that someone
that agrees happens to agree 5x more)
- Only relative comparison ( whether something is lower or higher)
➔ Continuous variables
• Continuous variables are continuous but can also be discrete (heights in whole
numbers)
• Continuous variables can take any level of precision
• Discrete : Can only take certain values ( countable numbers)
→ The difference between two values is meaningful
- Comparisons have meaning ( twice as much)
3. Interval
- Zero is meaningless / arbitrary
- Eg: a temperature of 0.0 does not meant no heat or 0 Ph ( mostly natural science)
4. Ratio
,- Like interval but meaningful 0
- E.g: conflicts / height/ weight / salary / kelvin
➔ Distribution:
( you can use a simple bar
→ How data values are distributed in relation to other values
→ Frequency distribution: The distribution of statistical data to show all the possible values
( or intervals) of the data and how often they occur
➔ How can we describe different distributions?
➔ Measures of central tendency
• A value that attempts to describe data by identifying central position with in the set
of data
1. Mode
- Most frequent occurring in data set
- There can be several modes
2. Median
- Data has been arranged in order to magnitude (Groote)
- If you have two than you add them and divide by two ( if you have even number)
Advantage: Not sensitive to extreme values
, 3. Mean
• Average of number
Disadvantage: Sensitive to extreme values and outliers → Median more useful
➔ Measures of dispersion
• How squeezed stretched is the distribution
( nominal no natural order of categories)
Measures of central tendency and spread
Distributions, means & deviation
➔ Introduction:
→ Why statistics???
- Data is everywhere
- Needed for political analysis and inferences
- Makes sense of collected data
- Discover patterns and casual relationships
- Become critical
➔ Variable:
→ Anything that can be measured and can differ across entities or across time
- If doesn’t vary it constant so its not a variable by definition
- E.g: hair color/ level of trust / Age
➔ Crucial distinction in variables:
Independent variable (X):
, • Refers to the causes (has effect on the dependent variable)
Dependent variable (Y):
• Outcome
➔ Variables have different scales or different levels of measurement
→ Levels of measurement: The nature of information with in values assigned to variables
→ We can divide them into two continuous and two categorical
➔ Categorical variables:
1. Nominal
- Two or more exclusive categories
- No natural order
- No arithmetic operations possible ( subtract or equal greater)
- Political party/ favorite football club/ hair color
- Frequency: mode
2. Ordinal
- Clear ordering of the values ( lower or higher)
- E.g: Level of agreement – level of education – political interest- trust in government
- Spacing between values not the same across levels ( so you can not say that someone
that agrees happens to agree 5x more)
- Only relative comparison ( whether something is lower or higher)
➔ Continuous variables
• Continuous variables are continuous but can also be discrete (heights in whole
numbers)
• Continuous variables can take any level of precision
• Discrete : Can only take certain values ( countable numbers)
→ The difference between two values is meaningful
- Comparisons have meaning ( twice as much)
3. Interval
- Zero is meaningless / arbitrary
- Eg: a temperature of 0.0 does not meant no heat or 0 Ph ( mostly natural science)
4. Ratio
,- Like interval but meaningful 0
- E.g: conflicts / height/ weight / salary / kelvin
➔ Distribution:
( you can use a simple bar
→ How data values are distributed in relation to other values
→ Frequency distribution: The distribution of statistical data to show all the possible values
( or intervals) of the data and how often they occur
➔ How can we describe different distributions?
➔ Measures of central tendency
• A value that attempts to describe data by identifying central position with in the set
of data
1. Mode
- Most frequent occurring in data set
- There can be several modes
2. Median
- Data has been arranged in order to magnitude (Groote)
- If you have two than you add them and divide by two ( if you have even number)
Advantage: Not sensitive to extreme values
, 3. Mean
• Average of number
Disadvantage: Sensitive to extreme values and outliers → Median more useful
➔ Measures of dispersion
• How squeezed stretched is the distribution
( nominal no natural order of categories)