Regression Notes
Statistical inference: The attempt to reach a conclusion concerning all members of a
class (population) from observations of only some of them (sample).
Population: A collection of observations.
• Numerical descriptor = A parameter.
• Population size = N.
• Mean (a measure of centre) = .
• Variance (a measure of dispersion) = 2.
• Standard deviation = .
Sample: A part or subset of a population.
• Numerical descriptor = A statistic.
• Sample size = n.
• Sample mean = 𝑥̅ .
• Sample variance = s2.
• Sample standard deviation = s.
Hypothesis testing:
• Define the null hypothesis, H0.
• Define the alternative hypothesis, Ha, where Ha is a form of “not H0”.
• Define the type 1 error (probability of falsely rejecting the null hypothesis), ,
usually 0.05.
𝑥̅ − 𝜇
• Calculate the test statistic. 𝑧 = .
𝜎
• Calculate the p-value (the probability of getting a result “as or more extreme”
than observed if the null hypothesis is true).
• If the p-value is , reject H0. Otherwise, fail to reject H0.
, Correlation analysis: Measures the strength and direction of the linear relationship
between two variables.
Correlation coefficient
• Population correlation coefficient (p) – Measures the strength of the association
between the variables.
• Sample correlation coefficient (r) – An estimate of p, used to measure the
strength of the linear relationship in the sample observations.
• The sample correlation coefficient (r):
• Unit free.
• Lies between -1 (strong negative relationship) and 1 (strong positive relationship).
• Significance of the correlation coefficient: