100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Samenvatting

Summary Intermediate Statistics 2 Study Guide

Beoordeling
-
Verkocht
3
Pagina's
18
Geüpload op
08-06-2022
Geschreven in
2021/2022

Study guide of intermediate statistics 2, includes notes from the lectures, the textbook, and PBLs











Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Documentinformatie

Heel boek samengevat?
Nee
Wat is er van het boek samengevat?
Onbekend
Geüpload op
8 juni 2022
Aantal pagina's
18
Geschreven in
2021/2022
Type
Samenvatting

Voorbeeld van de inhoud

Maria Andrade



Stats II Study Guide


Week 1: Revision Stats I & Dummy Coding


Revision Stats 1

Linear Regression

● Dependent variable → Y
● Independent variable(s) → X
● Function of linear regression:
○ B0 → population y-intercept
○ B1 → population slope coefficient
○ Xi → independent variable
○ Ei → random error

Eg: Interpretation of betas
● Eg: pricei= B0 + B1 · squaremeteri + B2 · bedrooms + Ei
● B0: the predicted house price when the amount of bedrooms is 0 and the square meters is 0
● B1: the increase in the predicted house price for every additional square meter given that the amount
of bedrooms remains constant
● B2: the increase in the predicted house price for every additional bedroom given that the amount of
square meters remains constant.

P-Values

● Alpha = 0.05 → how often we allow ourselves to make a mistake
● compare the p-value with alpha → if the p-value is lower than alpha you reject the Ho

Model Fit: To test model fit you have SST, SSR and SSM


Model Fit description Formula Variance exp

SST difference btw the observed total unstandardized variance
data and the mean of y

SSR Difference btw the observed unexplained unstandardized variances→
data and the model variation not accounted for in the model

, Maria Andrade



SSM Difference btw the men value of explained unstandardized variance →
Y and the model variation accounted for in the model


F-Ratio

● F-ratio: the ratio btw the standardized SSM and standardized SSR

○ Formula:

■ MSM Formula =
● MSM stands for the standardized explained variance

■ MSR formula =
● MSR stands for the standardized unexplained variance
○ When the F-ratio is high → the explained variance is high and the unexplained variance is low
R^2

● R2: the proportion of explained variance over total variance

○ Formula:
● Can be used to compare models, to see if one is better than the other
● The higher the R2 the more variance is explained

Assumptions of a Line

● If the assumptions are not met, then the inference of the results are invalid.


Linearity Independence of Normality (errors) Homoscedasticity multicollinearity
errors

meaning If yi is a linear The errors are Errors are normally Errors have equal 2 or + predictors are
function of the independent distributed variance highly correlated with
predictors each other

Check Residuals plot: X If time series 1)Histograms Zpred-Zresid plot VIF (>10) or tolerance
= ZPRED, Y = Durbin- Watson 2) PP/QQ plots Leven’s Test (<0.1) Average VIF
ZRESID 3)KS-SW test “much larger” than 1
If residuals are Not for cross 4)Skew & Kurtosis
symmetric sectional data
around 0

, Maria Andrade



+ 2)PP/QQ plots: Pp-plot: Equality of variance of Predictors explain the
magnify deviations in the errors same variance
middle & qq-plot : magnify
deviations in the tails
4) s/SEskewness K
/SEkurtosis

Fix Transform data/ Multilevel modeling SE’s are inflated, change SE’ inflates Remove variables
change model or clustered SEs through transform or Transform or
bootstrap bootstrapping



Outliers

● An outlier is an extreme in y
● Its cause of concern when:
○ >5% of data > 1.96 sd
○ >1% of data > 2.58 sd
○ >3.29sd

Influential Cases

● A case which influences any part of the regression analysis
● Its an extreme in x → pushes regression line
● Diagnostics:
○ Leverage → measures potential to influence regression
○ Mahalanobis distance → measures potential to influence regression
○ DFFIT(s) → difference in mean y including and excluding case
○ SDFBeta → change in one regression coefficient after exclusion
○ Cook’s Distance → the average of changes in all regression coefficients after exclusion


Dummy Coding

Dummy coding → categorical predictor with multiple categories

Steps:
1. Recode a variable into dummies
2. Number of dummies = categories - 1
3. A dummy is 0 or 1 for a particular category
4. Reference category is 0 for all dummies
€8,99
Krijg toegang tot het volledige document:

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

Maak kennis met de verkoper
Seller avatar
mcandradep01

Maak kennis met de verkoper

Seller avatar
mcandradep01 Erasmus Universiteit Rotterdam
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
4
Lid sinds
5 jaar
Aantal volgers
4
Documenten
2
Laatst verkocht
1 jaar geleden

0,0

0 beoordelingen

5
0
4
0
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen