100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Other

USYD DATA1001 Foundations of Data Science Summary of the lecture slides and additional notes taken 2025 JUNE EXAM PREP (University of Sydney)

Rating
-
Sold
-
Pages
95
Uploaded on
05-06-2025
Written in
2024/2025

USYD DATA1001 Foundations of Data Science Summary of the lecture slides and additional notes taken 2025 JUNE EXAM PREP (University of Sydney) 1. Articulate the importance of statistics in a data-rich world, including current challenges such as ethics, privacy and big data. 2. Identify the study design behind a dataset and how the study design affects context specific outcomes. 3. Produce, interpret and compare graphical and numerical summaries, , using base R and ggplot (extension). 4. Apply the Normal approximation to data, with consideration of measurement error. 5. Model and explain the relationship between 2 variables using linear regression. 6. Use the box model to describe chance and chance variability, including sample surveys and the central limit theorem. 7. Given real multivariate data and a problem, formulate an appropriate hypothesis and perform a range of hypothesis tests. 8. Interpret the p-value, conscious of the various pitfalls associated with testing. 9. Critique the use of statistics in media and research papers in a wide variety of data contexts, with attention to confounding and bias. EXPLORING DATA

Show more Read less
Institution
DATA1001 Foundations Of Data Science
Course
DATA1001 Foundations of Data Science











Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
DATA1001 Foundations of Data Science
Course
DATA1001 Foundations of Data Science

Document information

Uploaded on
June 5, 2025
Number of pages
95
Written in
2024/2025
Type
Other
Person
Unknown

Subjects

Content preview

USYD DATA1001 Foundations of Data Science
Summary of the lecture slides and additional notes
taken 2025 JUNE EXAM PREP (University of
Sydney)




1. Articulate the importance of statistics in a data-rich world, including current challenges such as
ethics, privacy and big data.
2. Identify the study design behind a dataset and how the study design affects context
specific outcomes.
3. Produce, interpret and compare graphical and numerical summaries, , using base R and
ggplot (extension).
4. Apply the Normal approximation to data, with consideration of measurement error.
5. Model and explain the relationship between 2 variables using linear regression.
6. Use the box model to describe chance and chance variability, including sample surveys and the
central limit theorem.
7. Given real multivariate data and a problem, formulate an appropriate hypothesis and perform a
range of hypothesis tests.
8. Interpret the p-value, conscious of the various pitfalls associated with testing.
9. Critique the use of statistics in media and research papers in a wide variety of data contexts,
with attention to confounding and bias.


EXPLORING DATA

➢ Controlled Experiments

Domain Knowledge
➢ Background context that helps you understand data (need curiosity and
become specialist in area investigated)
➢ Eg. What is Roaccutane prescribed? how does it work? What are known side effects?

Types of Evidence
➢ Personal testimony/observation → more generalised finding
➢ source(s) behind media article often poorly cited
➢ In reputable research journal → every study stage should be documented and reviewed

, ○ journals require reproducible research → data sets available for verification
& analysis

Design of the Study
➢ Scientists gave Roaccutane to young adult mice for 6 weeks → tested response to stress
➢ Mice on Roaccutane were less mobile → assumed sign of depression.

The Method of Comparison
➢ Scientists use controlled experiment to determine effect of treatment on a response
variable (thing trying to model/predict eg. depression)
○ Treatment Group given new drug

, ○ Control Group is not
➢ Types of control groups
○ Contemporaneous = occur at the same time as treatment groups
○ Historical = earlier than treatment groups (comparing past experiment)
■ Used if currently an ethical issue
■ BUT were conditions exactly the same?
➢ Must control all other variables on treatment → same for both groups eg. psychological
factors (susceptibility to depression)
○ If groups not comparable → differences can confound (mix up) effect of
the treatment.

3 Potential Confounders (method of allocation)
➢ SELECTION BIAS → Calls for random allocation
○ Bias affects accuracy if based on investigator's judgment (nonrandomized) eg.
doctor’s choosing healthy people to undergo operation due to risk of death.
They lived longer, but operation or health?
➢ OBSERVER BIAS → calls for double-blind design (not aware of the identity of the 2
groups)
○ Placebo effect = when subject responds to idea of treatment.
○ If the subjects/investigators aware of identity of groups → bias in responses or
evaluations
➢ CONSENT BIAS
○ When subjects choose if they take part in experiment eg. polio vaccine → richer
people said yes, poorer more likely to say no
■ Should say they may or may not get vaccine

∴ BEST METHOD OF COMPARISON
➢ Random allocation → no selection bias
➢ Double blind → no observer bias

➢ Observational Studies

In observational studies
➢ Investigator cannot use randomisation for allocation of subjects into treatment
and control groups
➢ Used in most educational research

Precautions

, 1. Observational studies can't establish causation
➢ Can only establish association (link)
➢ Points to but does not prove causation (may not cause, but increase risk of)
➢ Eg. smoker more likely to get liver cancer but this does not imply it causes it
○ Smokers drink more alcohol ∴ effect of smoking confounded with
alcohol consumption
2. Can have misleading hidden confounders
➢ Confounding occurs when effect of treatment caused by some other variable/s
in Treatment and Control Groups
➢ Confounding variables can introduced due to:
○ selection bias → some subjects more likely to be chosen eg. investigators select
healthier subjects for surgery
○ survivor bias → dropout of some subjects eg. "improvement" due to dropout of
worst/unresponsive subjects
○ adherers and non-adherers → some subjects more compliant and healthier
already (stick to it) eg. not take drug
Strategy for dealing with confounders (controlling for confounders)
➢ Make groups more comparable by dividing into subgroups with respect to confounder
○ Eg. Controlling for alcohol consumption → split up smokers according to
alcohol consumption
○ Limitations
■ Need to find confounder (often hard to find)
3. Observational studies with confounding variable can lead to Simpson's Paradox
➢ Simpson’s paradox (reversing paradox) → trend in individual groups of data that
disappears when the groups are pooled together (trend/percentages reversed) due
to confounding variable






➢ Eg. More young women smoked than older women, and since younger expected to live
longer, adding all groups makes smoking appear beneficial (age = confounding
variable)
4. Observational studies result from using historical control
➢ Time is a confounding variable eg. comparing new medication on current patients vs.
old medication on past patients (treatment & control groups may differ due to societal
change)

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
smartzone Liberty University
View profile
Follow You need to be logged in order to follow users or courses
Sold
3192
Member since
5 year
Number of followers
2291
Documents
14354
Last sold
5 hours ago
AMAIZING EDUCATION WORLD

GET ALL KIND OF EXAMS ON THIS PAGE ,COMPLETE TEST BANKS,SUMMARIES,STUDY GUIDES,PROJECT PAPERS,ASSIGNMENTS,CASE STUDIES, YOU CAN ALSO COMMUNICATE WITH THE SELLER FOR ANY PRE-ORDER,ORDER AND ETC.

3.7

584 reviews

5
260
4
93
3
103
2
29
1
99

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions