Summary of the lecture slides and additional notes
taken 2025 JUNE EXAM PREP (University of
Sydney)
1. Articulate the importance of statistics in a data-rich world, including current challenges such as
ethics, privacy and big data.
2. Identify the study design behind a dataset and how the study design affects context
specific outcomes.
3. Produce, interpret and compare graphical and numerical summaries, , using base R and
ggplot (extension).
4. Apply the Normal approximation to data, with consideration of measurement error.
5. Model and explain the relationship between 2 variables using linear regression.
6. Use the box model to describe chance and chance variability, including sample surveys and the
central limit theorem.
7. Given real multivariate data and a problem, formulate an appropriate hypothesis and perform a
range of hypothesis tests.
8. Interpret the p-value, conscious of the various pitfalls associated with testing.
9. Critique the use of statistics in media and research papers in a wide variety of data contexts,
with attention to confounding and bias.
EXPLORING DATA
➢ Controlled Experiments
Domain Knowledge
➢ Background context that helps you understand data (need curiosity and
become specialist in area investigated)
➢ Eg. What is Roaccutane prescribed? how does it work? What are known side effects?
Types of Evidence
➢ Personal testimony/observation → more generalised finding
➢ source(s) behind media article often poorly cited
➢ In reputable research journal → every study stage should be documented and reviewed
, ○ journals require reproducible research → data sets available for verification
& analysis
Design of the Study
➢ Scientists gave Roaccutane to young adult mice for 6 weeks → tested response to stress
➢ Mice on Roaccutane were less mobile → assumed sign of depression.
The Method of Comparison
➢ Scientists use controlled experiment to determine effect of treatment on a response
variable (thing trying to model/predict eg. depression)
○ Treatment Group given new drug
, ○ Control Group is not
➢ Types of control groups
○ Contemporaneous = occur at the same time as treatment groups
○ Historical = earlier than treatment groups (comparing past experiment)
■ Used if currently an ethical issue
■ BUT were conditions exactly the same?
➢ Must control all other variables on treatment → same for both groups eg. psychological
factors (susceptibility to depression)
○ If groups not comparable → differences can confound (mix up) effect of
the treatment.
3 Potential Confounders (method of allocation)
➢ SELECTION BIAS → Calls for random allocation
○ Bias affects accuracy if based on investigator's judgment (nonrandomized) eg.
doctor’s choosing healthy people to undergo operation due to risk of death.
They lived longer, but operation or health?
➢ OBSERVER BIAS → calls for double-blind design (not aware of the identity of the 2
groups)
○ Placebo effect = when subject responds to idea of treatment.
○ If the subjects/investigators aware of identity of groups → bias in responses or
evaluations
➢ CONSENT BIAS
○ When subjects choose if they take part in experiment eg. polio vaccine → richer
people said yes, poorer more likely to say no
■ Should say they may or may not get vaccine
∴ BEST METHOD OF COMPARISON
➢ Random allocation → no selection bias
➢ Double blind → no observer bias
➢ Observational Studies
In observational studies
➢ Investigator cannot use randomisation for allocation of subjects into treatment
and control groups
➢ Used in most educational research
Precautions
, 1. Observational studies can't establish causation
➢ Can only establish association (link)
➢ Points to but does not prove causation (may not cause, but increase risk of)
➢ Eg. smoker more likely to get liver cancer but this does not imply it causes it
○ Smokers drink more alcohol ∴ effect of smoking confounded with
alcohol consumption
2. Can have misleading hidden confounders
➢ Confounding occurs when effect of treatment caused by some other variable/s
in Treatment and Control Groups
➢ Confounding variables can introduced due to:
○ selection bias → some subjects more likely to be chosen eg. investigators select
healthier subjects for surgery
○ survivor bias → dropout of some subjects eg. "improvement" due to dropout of
worst/unresponsive subjects
○ adherers and non-adherers → some subjects more compliant and healthier
already (stick to it) eg. not take drug
Strategy for dealing with confounders (controlling for confounders)
➢ Make groups more comparable by dividing into subgroups with respect to confounder
○ Eg. Controlling for alcohol consumption → split up smokers according to
alcohol consumption
○ Limitations
■ Need to find confounder (often hard to find)
3. Observational studies with confounding variable can lead to Simpson's Paradox
➢ Simpson’s paradox (reversing paradox) → trend in individual groups of data that
disappears when the groups are pooled together (trend/percentages reversed) due
to confounding variable
➢
➢ Eg. More young women smoked than older women, and since younger expected to live
longer, adding all groups makes smoking appear beneficial (age = confounding
variable)
4. Observational studies result from using historical control
➢ Time is a confounding variable eg. comparing new medication on current patients vs.
old medication on past patients (treatment & control groups may differ due to societal
change)