STATS 401 FINAL EXAM WITH CORRECT SOLUTIONS 100%
VERIFIED!!
Classical statistics and modern big-data methods have different goals and assumptions.
Explain how these two approaches differ in terms of their objectives, data constraints,
and inference methods.
Classical statistics:
Objectives: extracting meaningful info and drawing conclusions from data,
Data constraints: work with a sample, when pop is too large. But randomness doesn't
guarantee representativeness, and a biased sample can occur when certain groups are
overrepresented or underrepresented. Bias can lead to invalid conclusions.
Inference methods: parametric tests (t-tests), confidence intervals, maximum likelihood
estimation.
Modern big-data methods:
Objectives: accurate prediction, predict future trends, classify data, identify patterns in
datasets
Data constraints: modern datasets are so large that calculations by hand is impossible,
datasets frequently lack randomness needed for traditional stat inference, so it is hard
to draw conclusions about pop in the classical sense
Inference methods: Machine Learning models (decision trees, neural networks),
cross-validation for model selection and evaluation
Frequentist: probability =
long-run relative frequency of an event
Define a p-value. Explain reasoning for why a small p-value leads to the rejection of the
null hypothesis.
, P-value: probability of observing collected data (or something more extreme) under the
assumption that the null hypothesis is true
Explain why a small p-value leads to rejection of the null hypothesis: if the p-value is
smaller than alpha, then the probability of observing our sample of data from
randomness alone is small. The fact that we actually have this sample says that the
assumption from the null hypothesis might be wrong. We conclude that we have
evidence to reject null.
Explain the CLT and describe why it is important for making statistical inferences.
Provide example of how CLT is used in hypothesis testing.
Explain: regardless of the shape of the original pop, the sampling dist of the sample
mean approximates a normal dist, provided that the samples are independent and
sufficiently large
at large enough size, n > 30, the sample distribution of sample mean x_bar reaches
approximately a Normal distribution (assuming samples are independent).
Important in making statistical inferences because: it lets us infer population
parameters based on sample statistics. Common challenge of not knowing population
standard deviation -> use sample standard deviation instead.
Example for how it's used in hypothesis testing: in student's two same t-test and Welch
two sample t-test, we assume normal distribution of sample means. lets us use z score
and t tests and anova
Explain difference between observational study and a controlled experiment.
Difference between:
Observational study: researchers observe subjects and collect data without interfering
in subject's environment or decisions
Association but no causation
Controlled experiment: researchers randomly assign subjects to a treatment group or
control group to evaluate effect of an intervention
Cause and effect can be concluded
VERIFIED!!
Classical statistics and modern big-data methods have different goals and assumptions.
Explain how these two approaches differ in terms of their objectives, data constraints,
and inference methods.
Classical statistics:
Objectives: extracting meaningful info and drawing conclusions from data,
Data constraints: work with a sample, when pop is too large. But randomness doesn't
guarantee representativeness, and a biased sample can occur when certain groups are
overrepresented or underrepresented. Bias can lead to invalid conclusions.
Inference methods: parametric tests (t-tests), confidence intervals, maximum likelihood
estimation.
Modern big-data methods:
Objectives: accurate prediction, predict future trends, classify data, identify patterns in
datasets
Data constraints: modern datasets are so large that calculations by hand is impossible,
datasets frequently lack randomness needed for traditional stat inference, so it is hard
to draw conclusions about pop in the classical sense
Inference methods: Machine Learning models (decision trees, neural networks),
cross-validation for model selection and evaluation
Frequentist: probability =
long-run relative frequency of an event
Define a p-value. Explain reasoning for why a small p-value leads to the rejection of the
null hypothesis.
, P-value: probability of observing collected data (or something more extreme) under the
assumption that the null hypothesis is true
Explain why a small p-value leads to rejection of the null hypothesis: if the p-value is
smaller than alpha, then the probability of observing our sample of data from
randomness alone is small. The fact that we actually have this sample says that the
assumption from the null hypothesis might be wrong. We conclude that we have
evidence to reject null.
Explain the CLT and describe why it is important for making statistical inferences.
Provide example of how CLT is used in hypothesis testing.
Explain: regardless of the shape of the original pop, the sampling dist of the sample
mean approximates a normal dist, provided that the samples are independent and
sufficiently large
at large enough size, n > 30, the sample distribution of sample mean x_bar reaches
approximately a Normal distribution (assuming samples are independent).
Important in making statistical inferences because: it lets us infer population
parameters based on sample statistics. Common challenge of not knowing population
standard deviation -> use sample standard deviation instead.
Example for how it's used in hypothesis testing: in student's two same t-test and Welch
two sample t-test, we assume normal distribution of sample means. lets us use z score
and t tests and anova
Explain difference between observational study and a controlled experiment.
Difference between:
Observational study: researchers observe subjects and collect data without interfering
in subject's environment or decisions
Association but no causation
Controlled experiment: researchers randomly assign subjects to a treatment group or
control group to evaluate effect of an intervention
Cause and effect can be concluded