AND ANSWERS
covariance - ANSWER-an unstandardized statistical measure summarizing
the general pattern of association (or the lack thereof) between two
continuous variables
- positive number means positive relationship
- negative number means negative relationship
Pearson's R (aka correlation coefficient) - ANSWER-- r > 0.8 = strong positive
linear correlation
- 0.8 > r > 0.5 = decent positive linear correlation
- 0.5 > r > 0.3 = weak positive linear correlation
- reverse for negative r
hypothesis testing with r - ANSWER-null: r = 0
alternative: r does not = 0
- we can calculate a t-statistic to determine if we can reject the null that
there is no linear relationship between X and Y
correlation coefficient warnings - ANSWER-- it is a measure of linear
association between two CONTINUOUS variables
- represents TIGHTNESS of linear relationship, NOT MAGNITUDE
- is the building block for linear regression
R code for correlation - ANSWER-Correlation Coefficient
-cor(data$x, data$y, use = "pairwise.complete.obs")
Correlation Coefficient with Hypothesis Test
-cor.test(data$x, data$y, method = "pearson", conf.level = 0.95)
, Correlation Matrix
cor(cbind(data$x, data$y, data$z, data$m , data$q), use =
"pairwise.complete.obs")
regression - ANSWER-- drawing the best fitting straight line through a cloud
of data
- why? to make an inference about the relationship between a dependant
variable (Y) and one or more independent variables (X)
predicting using regression - ANSWER-- Y = a +bX
- %Dems in Leg. = 22.68 + (1.03 x %Dems in State)
1. Calculate expected % Dem Legislators when there are 0% Dems in the
state
- 22.68 + (1.03 x 0) = 22.68
2. Calculate expected % Dem Legislators when there are 25% Dems in the
state
- 22.68 + (1.03 x 25) = 48.43
3. Calculate expected % Dem Legislators when there are 26% Dems in the
state
- 22.68 + (1.03 x 26) =4.1 49.46
- the difference between the predicted Y values is 1.03
t-test statistic - ANSWER-- if we reject the null ( using p-values or confidence
intervals or the critical value approach), we conclude there is a statistically
significant relationship between X and Y
r code for bivariate linear regression - ANSWER-bivariate linear regresssion
- lm(Y ~ X, data = dataset)
storing bivariate regression
regression1 <- lm (Y ~ X, data = dataset)