100% de satisfacción garantizada Inmediatamente disponible después del pago Tanto en línea como en PDF No estas atado a nada 4.2 TrustPilot
logo-home
Resumen

Summary Intermediate Statistics 2 Study Guide

Puntuación
-
Vendido
3
Páginas
18
Subido en
08-06-2022
Escrito en
2021/2022

Study guide of intermediate statistics 2, includes notes from the lectures, the textbook, and PBLs

Institución
Grado










Ups! No podemos cargar tu documento ahora. Inténtalo de nuevo o contacta con soporte.

Libro relacionado

Escuela, estudio y materia

Institución
Estudio
Grado

Información del documento

¿Un libro?
No
¿Qué capítulos están resumidos?
Desconocido
Subido en
8 de junio de 2022
Número de páginas
18
Escrito en
2021/2022
Tipo
Resumen

Temas

Vista previa del contenido

Maria Andrade



Stats II Study Guide


Week 1: Revision Stats I & Dummy Coding


Revision Stats 1

Linear Regression

● Dependent variable → Y
● Independent variable(s) → X
● Function of linear regression:
○ B0 → population y-intercept
○ B1 → population slope coefficient
○ Xi → independent variable
○ Ei → random error

Eg: Interpretation of betas
● Eg: pricei= B0 + B1 · squaremeteri + B2 · bedrooms + Ei
● B0: the predicted house price when the amount of bedrooms is 0 and the square meters is 0
● B1: the increase in the predicted house price for every additional square meter given that the amount
of bedrooms remains constant
● B2: the increase in the predicted house price for every additional bedroom given that the amount of
square meters remains constant.

P-Values

● Alpha = 0.05 → how often we allow ourselves to make a mistake
● compare the p-value with alpha → if the p-value is lower than alpha you reject the Ho

Model Fit: To test model fit you have SST, SSR and SSM


Model Fit description Formula Variance exp

SST difference btw the observed total unstandardized variance
data and the mean of y

SSR Difference btw the observed unexplained unstandardized variances→
data and the model variation not accounted for in the model

, Maria Andrade



SSM Difference btw the men value of explained unstandardized variance →
Y and the model variation accounted for in the model


F-Ratio

● F-ratio: the ratio btw the standardized SSM and standardized SSR

○ Formula:

■ MSM Formula =
● MSM stands for the standardized explained variance

■ MSR formula =
● MSR stands for the standardized unexplained variance
○ When the F-ratio is high → the explained variance is high and the unexplained variance is low
R^2

● R2: the proportion of explained variance over total variance

○ Formula:
● Can be used to compare models, to see if one is better than the other
● The higher the R2 the more variance is explained

Assumptions of a Line

● If the assumptions are not met, then the inference of the results are invalid.


Linearity Independence of Normality (errors) Homoscedasticity multicollinearity
errors

meaning If yi is a linear The errors are Errors are normally Errors have equal 2 or + predictors are
function of the independent distributed variance highly correlated with
predictors each other

Check Residuals plot: X If time series 1)Histograms Zpred-Zresid plot VIF (>10) or tolerance
= ZPRED, Y = Durbin- Watson 2) PP/QQ plots Leven’s Test (<0.1) Average VIF
ZRESID 3)KS-SW test “much larger” than 1
If residuals are Not for cross 4)Skew & Kurtosis
symmetric sectional data
around 0

, Maria Andrade



+ 2)PP/QQ plots: Pp-plot: Equality of variance of Predictors explain the
magnify deviations in the errors same variance
middle & qq-plot : magnify
deviations in the tails
4) s/SEskewness K
/SEkurtosis

Fix Transform data/ Multilevel modeling SE’s are inflated, change SE’ inflates Remove variables
change model or clustered SEs through transform or Transform or
bootstrap bootstrapping



Outliers

● An outlier is an extreme in y
● Its cause of concern when:
○ >5% of data > 1.96 sd
○ >1% of data > 2.58 sd
○ >3.29sd

Influential Cases

● A case which influences any part of the regression analysis
● Its an extreme in x → pushes regression line
● Diagnostics:
○ Leverage → measures potential to influence regression
○ Mahalanobis distance → measures potential to influence regression
○ DFFIT(s) → difference in mean y including and excluding case
○ SDFBeta → change in one regression coefficient after exclusion
○ Cook’s Distance → the average of changes in all regression coefficients after exclusion


Dummy Coding

Dummy coding → categorical predictor with multiple categories

Steps:
1. Recode a variable into dummies
2. Number of dummies = categories - 1
3. A dummy is 0 or 1 for a particular category
4. Reference category is 0 for all dummies
$10.92
Accede al documento completo:

100% de satisfacción garantizada
Inmediatamente disponible después del pago
Tanto en línea como en PDF
No estas atado a nada

Conoce al vendedor
Seller avatar
mcandradep01

Conoce al vendedor

Seller avatar
mcandradep01 Erasmus Universiteit Rotterdam
Seguir Necesitas iniciar sesión para seguir a otros usuarios o asignaturas
Vendido
4
Miembro desde
5 año
Número de seguidores
4
Documentos
2
Última venta
1 año hace

0.0

0 reseñas

5
0
4
0
3
0
2
0
1
0

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

Student with book image

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes