STATISTICS SUMMARY
Chapter 1 – What is statistics?
- Statistics = is a way to get information from data
- Descriptive Statistics: summarizing and presenting data in effective way.
- Inferential Statistics: drawing conclusions about population based on sample data
- Key statistical concepts
Population = group of all items of interest to a statistics practitioner
Parameter is the descriptive measure of the population
Sample = set of data drawn from the studied population
Statistic is the descriptive measure of sample
Statistical inference = the process of making an estimate, prediction or decision about a
population based on sample data – two measures of reliability
Confidence level: proportion of times that an estimating procedure will be correct
Significance level: how frequently the conclusion will be wrong
Chapter 2 – Graphical Descriptive Techniques I
- Types of data and information
Variable = characteristic of a population or sample
Values = possible observations of the variable
Data = observed values of the variable (datum is singular)
- Four types of data
Ratio = highest level, absolute point of zero - quantitative
All calculations allowed: = ≠ < > + - * / often average calculated
Interval = numbers such as heights, weights or incomes – quantitative & numerical
Ordinal = categories where order of values have meaning – ranking = ≠ < >
No specific graphical technique: bar charts and pie charts can be used
Nominal = categories such as single, married or divorced – qualitative & categorical
Can only count frequency or percentage of occurrence (relative frequency): = ≠
Frequency distribution presented in bar chart or pie chart (proportions)
- Higher-level data may be treated as lower-level data but not the other way around
Chapter 3 – Graphical Descriptive Techniques II
- Histogram is used for interval data
Observations that fall into a series of intervals are classes
Intervals don’t overlap, every observation is assigned and the intervals are equally wide
- Number of classes is based on the number of observations: # = 1 + 3.3log(n)
- With of class = (largest observation – smallest observation) / number of classes
- Shapes of a histogram
Symmetric: two sides identical in shape and size
Skewness: long tail extending to right (positively skewed \) or left (negatively skewed /)
Modal class: class with the largest number of observations
Unimodal histogram: has only one peak – bimodal histogram: has two peaks
Bell Shape: a special type of symmetric unimodal histogram
- Stam-and-leaf display is similar a display as histogram but with actual observations
- Relative frequency distribution is created by dividing frequencies by number of observations
, Total sum is always 1.0/100%
Cumulative relative frequency distribution highlights observations below class limits
Ogive is graphical representation of cumulative relative frequencies
Chapter 4 – Numerical descriptive techniques
- Measures of central location – three different measures
Mean: μ = the average - only for interval and ratio data (formula sheet)
Median = middle observations when placing all in order
Not as sensitive to extreme values as the mean
Best for either very small or extreme number of observations (ordinal, ratio, interval)
Mode = observation that occurs with the greatest frequency
For populations and large samples report modal class
- Measures of variability – spread of variability (only for interval and ratio data)
Range = largest observation – smallest observation
Variance: σ² (population) and s² (sample) – (formula sheet)
Standard deviation: σ and s = related measure
Mean absolute deviation (MAD) is average absolute value
Standard Deviation: σ = √σ² and s = √s²
- Empirical Rule can be used when histogram is bell shaped
68% of all observations fall within one standard deviation of the mean
95% of all observations fall within two standard deviations of the mean
99.7% of all observations fall within three standard deviations of the mean
- Chebysheff’s Theorem applies to all shaped of histograms: 1-(1/k²) for k>1
- Coefficient of Variation: CV = σ / μ and cv = s / x
Chapter 6 – Probability
- Probability provides a link between population and sample
- Random experiment = action/process that leads to one of several possible outcomes
Example: flip a coin – either head or tail of grade on test – A, B, C, D or F
List of outcomes includes all possibilities, and no two outcomes can occur twice
Sample space: S = list of all possible outcomes – exhaustive and mutually exclusive
- Two requirements of probabilities – given sample space S = {O 1, O2, …., Ok}
Probability outcome between 0 and 1: 0 ≤ P(Oi) ≤ 1 for each i
Sum of all probabilities is 1
- Classical approach: calculate games of chance – head or tail is 50%
- Relative frequency approach: long-run relative frequency, look at past – 200 out of 1000 is 20%
This method is always used to interpret the probability
- Subjective approach: define probability as degree of belief – analysing factors influencing stock
- Event = collection/set of one or more individual outcomes in a sample space
- Probability of an event = sum of probabilities of simple event that make the event
- Intersection of Events A and B: event that occurs when both A and B occur – A and B
Probability of the intersection = joined probability
- Marginal probabilities: adding the probability across rows or down columns
- Conditional probability: probability of A given event B – P(A|B) = P(A and B) / P (B)
- Union of Events A and B: event that occurs when either A or B or both occur – A or B
- Probability rules
Chapter 1 – What is statistics?
- Statistics = is a way to get information from data
- Descriptive Statistics: summarizing and presenting data in effective way.
- Inferential Statistics: drawing conclusions about population based on sample data
- Key statistical concepts
Population = group of all items of interest to a statistics practitioner
Parameter is the descriptive measure of the population
Sample = set of data drawn from the studied population
Statistic is the descriptive measure of sample
Statistical inference = the process of making an estimate, prediction or decision about a
population based on sample data – two measures of reliability
Confidence level: proportion of times that an estimating procedure will be correct
Significance level: how frequently the conclusion will be wrong
Chapter 2 – Graphical Descriptive Techniques I
- Types of data and information
Variable = characteristic of a population or sample
Values = possible observations of the variable
Data = observed values of the variable (datum is singular)
- Four types of data
Ratio = highest level, absolute point of zero - quantitative
All calculations allowed: = ≠ < > + - * / often average calculated
Interval = numbers such as heights, weights or incomes – quantitative & numerical
Ordinal = categories where order of values have meaning – ranking = ≠ < >
No specific graphical technique: bar charts and pie charts can be used
Nominal = categories such as single, married or divorced – qualitative & categorical
Can only count frequency or percentage of occurrence (relative frequency): = ≠
Frequency distribution presented in bar chart or pie chart (proportions)
- Higher-level data may be treated as lower-level data but not the other way around
Chapter 3 – Graphical Descriptive Techniques II
- Histogram is used for interval data
Observations that fall into a series of intervals are classes
Intervals don’t overlap, every observation is assigned and the intervals are equally wide
- Number of classes is based on the number of observations: # = 1 + 3.3log(n)
- With of class = (largest observation – smallest observation) / number of classes
- Shapes of a histogram
Symmetric: two sides identical in shape and size
Skewness: long tail extending to right (positively skewed \) or left (negatively skewed /)
Modal class: class with the largest number of observations
Unimodal histogram: has only one peak – bimodal histogram: has two peaks
Bell Shape: a special type of symmetric unimodal histogram
- Stam-and-leaf display is similar a display as histogram but with actual observations
- Relative frequency distribution is created by dividing frequencies by number of observations
, Total sum is always 1.0/100%
Cumulative relative frequency distribution highlights observations below class limits
Ogive is graphical representation of cumulative relative frequencies
Chapter 4 – Numerical descriptive techniques
- Measures of central location – three different measures
Mean: μ = the average - only for interval and ratio data (formula sheet)
Median = middle observations when placing all in order
Not as sensitive to extreme values as the mean
Best for either very small or extreme number of observations (ordinal, ratio, interval)
Mode = observation that occurs with the greatest frequency
For populations and large samples report modal class
- Measures of variability – spread of variability (only for interval and ratio data)
Range = largest observation – smallest observation
Variance: σ² (population) and s² (sample) – (formula sheet)
Standard deviation: σ and s = related measure
Mean absolute deviation (MAD) is average absolute value
Standard Deviation: σ = √σ² and s = √s²
- Empirical Rule can be used when histogram is bell shaped
68% of all observations fall within one standard deviation of the mean
95% of all observations fall within two standard deviations of the mean
99.7% of all observations fall within three standard deviations of the mean
- Chebysheff’s Theorem applies to all shaped of histograms: 1-(1/k²) for k>1
- Coefficient of Variation: CV = σ / μ and cv = s / x
Chapter 6 – Probability
- Probability provides a link between population and sample
- Random experiment = action/process that leads to one of several possible outcomes
Example: flip a coin – either head or tail of grade on test – A, B, C, D or F
List of outcomes includes all possibilities, and no two outcomes can occur twice
Sample space: S = list of all possible outcomes – exhaustive and mutually exclusive
- Two requirements of probabilities – given sample space S = {O 1, O2, …., Ok}
Probability outcome between 0 and 1: 0 ≤ P(Oi) ≤ 1 for each i
Sum of all probabilities is 1
- Classical approach: calculate games of chance – head or tail is 50%
- Relative frequency approach: long-run relative frequency, look at past – 200 out of 1000 is 20%
This method is always used to interpret the probability
- Subjective approach: define probability as degree of belief – analysing factors influencing stock
- Event = collection/set of one or more individual outcomes in a sample space
- Probability of an event = sum of probabilities of simple event that make the event
- Intersection of Events A and B: event that occurs when both A and B occur – A and B
Probability of the intersection = joined probability
- Marginal probabilities: adding the probability across rows or down columns
- Conditional probability: probability of A given event B – P(A|B) = P(A and B) / P (B)
- Union of Events A and B: event that occurs when either A or B or both occur – A or B
- Probability rules