100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Summary Statistics 2 (P_BSTATIS_2)

Rating
-
Sold
-
Pages
14
Uploaded on
20-03-2025
Written in
2024/2025

A concise summary of the most important content from the Statistics 2 course (P_BSTATIS_2), based on lectures and the book (see below). Alan Agresti (2018). Statistical Methods For The Social Sciences – 5th global edition. Pearson Education International.

Show more Read less
Institution
Course










Whoops! We can’t load your doc right now. Try again or contact support.

Connected book

Written for

Institution
Study
Course

Document information

Summarized whole book?
No
Which chapters are summarized?
Chapter 10 to 14
Uploaded on
March 20, 2025
File latest updated on
March 20, 2025
Number of pages
14
Written in
2024/2025
Type
Summary

Subjects

Content preview

chapter 10 introduction to multivariate relationships
causal relationships are asymmetrical → 𝑥 causes 𝑦
- association between variables
o as 𝑥 changes, the distribution of 𝑦 should change in some way
o association does NOT imply causation
- appropriate time order
- elimination of alternative explanations
o observational studies can never prove that 1 variable is a cause of another
- anecdotal evidence is not enough to disprove causality unless it can deflate 1 of the 3
criteria
- randomized experiments are the standard for establishing causality, although this isn’t
always possible in social research

in multivariate analysis, a variable is said to be controlled when its influence is removed
- randomized experiments inherently control other variables in a probabilistic sense

statistical control: approximating an experimental type of control by grouping observations
with equal/similar values on the control variables in observational research

control variable: any variable that is held constant
lurking variable: a variable not measured in a study, but does influence the association

multivariate associations
- spurious: both 𝑥1 and 𝑦 are dependent on 𝑥2 , but their association disappears when 𝑥2
is controlled
- chain relationship: the relationship between 𝑥1 and 𝑦 exists but is indirect. 𝑥2 is an
intervening variable or mediator
- multiple causes: can either be independent or dependent (= there exists a relationship
between the causes themselves)
- suppressor: when controlling for a suppressor variable, the association between 2
variables increases
- interaction: an association has diff strengths and/or directions at diff values of the
control variable




Simpson’s paradox: the possibility that after controlling for a variable, each association has the
opposite direction as the bivariate association

confounding: when 2 explanatory variables both have effects on a response variable but are
also associated with each other
- omitted variable bias: a study neglecting to observe a confounding variable that
explains a major part of the effect


1

,chapter 9 linear regression and correlation
non-directional: 𝑥 predicts 𝑦
directional:
- pos association: higher 𝑥 predicts higher 𝑦
- neg association: higher 𝑥 predicts lower 𝑦

linear regression model: 𝑦̂ = 𝑎 + 𝑏𝑥
- predicted criterion value → 𝑦̂
- 𝑦-intercept → 𝑎
- slope → 𝑏
o pos when high 𝑥-values coincide with high 𝑦-values, and vice versa
o neg when low 𝑥-values coincide with high 𝑦-values, and vice versa
o we can’t use 𝑏 to interpret the strength of the association between 𝑥 and 𝑦
▪ 𝑏 depends on the scale

we consider 3 types of 𝑦:
- 𝑦: observed outcome value of an individual
- 𝑦̅: avg outcome value (mean of 𝑦)
- 𝑦̂: individual’s predicted outcome value based on model

least square estimation: the best straight line falling closest to all data points in the scatterplot

𝑠
Pearson’s correlation: 𝑏*= 𝑟 = (𝑠𝑥 ) 𝑏
𝑦
- interpretation: 0 < negligible < .10 ≤ small < .30 ≤ moderate < .50 ≤ large
- both 𝑟 and 𝑏* are measures of effect size

residual (𝒆): vertical distance between observed 𝑦 and predicted 𝑦̂
- 𝑒 = 𝑦 − 𝑦̂
- we can use this residual to determine how well the model performs in predicting 𝑦

total sum of squares: 𝑇𝑆𝑆 = ∑(𝑦 − 𝑦̅)2
how much variation is there in the to be
explained dependent variable
marginal variation

sum of squared errors: 𝑆𝑆𝐸 = ∑(𝑦 − 𝑦̂)2
how much variation is still unexplained
after adding the independent variable
conditional variation

regression sum of squares: 𝑅𝑆𝑆 = ∑(𝑦̂ − 𝑦̅)2
how much variation is explained by adding
the independent variable

the smaller the 𝑆𝑆𝐸, the better the prediction → 𝑆𝑆𝐸 = 𝑇𝑆𝑆 − 𝑅𝑆𝑆

we use diff sum of squares to inspect the explanatory power of the model and for significance

2

, coefficient of determination (𝑹𝟐 ): proportion of variation in 𝑦 that is explained by the model
𝑇𝑆𝑆−𝑆𝑆𝐸 ∑(𝑦−𝑦̅)2 −∑(𝑦−𝑦̂)2
- 𝑅2 = 𝑇𝑆𝑆
= ∑(𝑦−𝑦̅)2
- 0≤𝑅 ≤1 2

- the closer to 1, the stronger the linear relationship
- interpretation: 0 < negligible < .02 ≤ small < .13 ≤ moderate < .26 ≤ large

inferential statistics: using sample data to make inferences abt the population parameters
- we can’t confirm hypotheses, but we can falsify
o by inspecting the probability of finding 𝑏 (or 𝑟) when the null hypothesis was true
o null hypothesis: no association between variables (independent)
▪ 𝐻0: 𝛽 = 0
o alternative hypothesis: association between variables (dependent)
▪ 𝐻𝑎: 𝛽 ≠ 0
▪ if directional: 𝛽 < 0 or 𝛽 > 0
- check significance of 𝑏 using 𝑡-statistic
𝑏
o 𝐻0: 𝛽 = 0 𝑡 = 𝑠𝑒 with 𝑑𝑓 = 𝑛 − 2
- check significance of 𝑅2 using the 𝐹-statistic
𝑅 2 /1 (𝑇𝑆𝑆−𝑆𝑆𝐸)/1 𝑅𝑆𝑆/1 𝑀𝑆𝑅
o 𝐹 = (1−𝑅2)/(𝑛−2) = 𝑆𝑆𝐸/(𝑛−2)
= 𝑆𝑆𝐸/(𝑛−2) = 𝑀𝑆𝐸
▪ 𝑑𝑓1 = 𝑘 = 1
𝑘 = number of regression parameters 𝑏
▪ 𝑑𝑓2 = 𝑛 − 𝑘 − 1 = 𝑛 − 2
- based on the 𝑡- or 𝐹-statistic, determine the 𝑝-value:
o what is the probability of finding a result this extreme, when the 𝐻0 is true?
- 𝐹 = 𝑡 2 → both options yield the same conclusion

4 scenarios are possible, depending on the decision and the condition of 𝐻0
- 2x erroneous decision (which we want to avoid)
o type 1 error: probability of rejecting 𝐻0 when it is true
▪ determined by the selected 𝛼-level (.05)
▪ if observed 𝑝-value < 𝛼 : reject 𝐻0
o type 2 error (𝛽): probability of not rejecting 𝐻0 when it is false
▪ determined by:
• strength of association/diff in population
• sample size of study
• selected 𝛼-level
o trade-off: the smaller the type 1 error, the larger the type 2 error
- 2x correct decision
o 1 − 𝛽 = power → probability of correctly rejecting 𝐻0
▪ typically aim for 80%

assumptions of linear regression:
- representativeness: analyses are based on a random sample
- functional form: relation between 𝑥 and 𝑦 is linear
- homoscedasticity: conditional variance around 𝑏 is equal for all 𝑥
- normal distribution: conditional variance of 𝑦 for all 𝑥 is normal




3
R128,92
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
d511

Get to know the seller

Seller avatar
d511 Vrije Universiteit Amsterdam
Follow You need to be logged in order to follow users or courses
Sold
9
Member since
2 year
Number of followers
1
Documents
6
Last sold
3 days ago

0,0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can immediately select a different document that better matches what you need.

Pay how you prefer, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card or EFT and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions