ISYE 6414 MIDTERM EXAM
QUESTIONS WITH COMPLETE
SOLUTIONS
The estimated mean response for one setting under which the predicting variable is
equal to x*
In prediction we focus on one particular setting.
Prediction contains 2 sources of uncertainty: :
- Due to the new observation
- Due to the parameter estimates of Beta_1 and Beta_0 (same as estimation)
Estimation vs Prediction - Answer-The uncertainty in estimation comes from the
estimation alone, whereas the prediction comes from the estimation of the regression
parameters and from a newness of the observation.
Outliers - Answer-This is any data point that is far from the majority of the data (in both
X and Y)
Leverage Points - Answer-These are data points that are far from the mean of the X's
Influential points - Answer-This is a data point that is far from the mean of both the X's
and Y's.
This is because they're influencing the fit of the regression i.e. They can change the
value of the estimated parameter, the statistical significance, the magnitude of the
estimated parameters or even the sign significantly
Coefficient of Variation/Determination - Answer-One approach to evaluate the predictive
power of the model is using the coefficient of variation/determination.
This is the R2 value: The total variability in Y can be explained by the linear regression
that uses X
Correlation Coefficient - Answer-This is another approach used to establish the linear
relationship between 2 variables.
The relationship between the correlation coefficient and R2 is that the square of the
correlation coefficient is indeed R2
, ANOVA - Answer-ANOVA is a linear regression model where the predicting factor is a
categorical variable
Objectives of ANOVA - Answer-The overarching objecting in ANOVA is to compare the
means across k populations.
Analyze the variability in the data: compare the within variability (variability within each
group) vs the between variability (variability between the means)
Test whether the means are equal
Estimate confidence intervals for all the pairs of means in order to identify which of the
means are not equal, or which are statistically significantly different
Pooled Variance Estimator (MSE) - Answer-The sample distribution of the pooled
variance estimator is a Chi-squared distribution with N-k degrees of freedom
Estimation of Mean Parameters - Answer-We use the sample mean of individual
samples to estimate the mean parameters.
The sample distribution would be a T distribution with N-k degrees of freedom
ANOVA Parameters - Answer-The parameters in ANOVA are the mean parameters as
well as the shared variance.
Hypotheses Testing for Equal Means - Answer-Null Hypotheses: The means are all
equal
Alternative Hypothesis: Some means are different
Not all of them have to be different, but at least 1 pair (2) of the means needs to be
different
Variance Estimator (Different from pooled variance estimator) - Answer-The sample
distribution of the variance estimator is a Chi-square distribution with N-1 degrees of
freedom
Sum of Square Totals - Answer-Sum of Square Errors and Sum of Square Treatments
Sum of Square Errors - Answer-This is the sum of square differences between the
observations and the individual sample means
Sum of Square Treatments - Answer-This is the sum of ni * the square difference
between the sample means of the individual samples - the overall mean
Within Group Variability - Answer-This is the variability within each group. It is
calculated by:
QUESTIONS WITH COMPLETE
SOLUTIONS
The estimated mean response for one setting under which the predicting variable is
equal to x*
In prediction we focus on one particular setting.
Prediction contains 2 sources of uncertainty: :
- Due to the new observation
- Due to the parameter estimates of Beta_1 and Beta_0 (same as estimation)
Estimation vs Prediction - Answer-The uncertainty in estimation comes from the
estimation alone, whereas the prediction comes from the estimation of the regression
parameters and from a newness of the observation.
Outliers - Answer-This is any data point that is far from the majority of the data (in both
X and Y)
Leverage Points - Answer-These are data points that are far from the mean of the X's
Influential points - Answer-This is a data point that is far from the mean of both the X's
and Y's.
This is because they're influencing the fit of the regression i.e. They can change the
value of the estimated parameter, the statistical significance, the magnitude of the
estimated parameters or even the sign significantly
Coefficient of Variation/Determination - Answer-One approach to evaluate the predictive
power of the model is using the coefficient of variation/determination.
This is the R2 value: The total variability in Y can be explained by the linear regression
that uses X
Correlation Coefficient - Answer-This is another approach used to establish the linear
relationship between 2 variables.
The relationship between the correlation coefficient and R2 is that the square of the
correlation coefficient is indeed R2
, ANOVA - Answer-ANOVA is a linear regression model where the predicting factor is a
categorical variable
Objectives of ANOVA - Answer-The overarching objecting in ANOVA is to compare the
means across k populations.
Analyze the variability in the data: compare the within variability (variability within each
group) vs the between variability (variability between the means)
Test whether the means are equal
Estimate confidence intervals for all the pairs of means in order to identify which of the
means are not equal, or which are statistically significantly different
Pooled Variance Estimator (MSE) - Answer-The sample distribution of the pooled
variance estimator is a Chi-squared distribution with N-k degrees of freedom
Estimation of Mean Parameters - Answer-We use the sample mean of individual
samples to estimate the mean parameters.
The sample distribution would be a T distribution with N-k degrees of freedom
ANOVA Parameters - Answer-The parameters in ANOVA are the mean parameters as
well as the shared variance.
Hypotheses Testing for Equal Means - Answer-Null Hypotheses: The means are all
equal
Alternative Hypothesis: Some means are different
Not all of them have to be different, but at least 1 pair (2) of the means needs to be
different
Variance Estimator (Different from pooled variance estimator) - Answer-The sample
distribution of the variance estimator is a Chi-square distribution with N-1 degrees of
freedom
Sum of Square Totals - Answer-Sum of Square Errors and Sum of Square Treatments
Sum of Square Errors - Answer-This is the sum of square differences between the
observations and the individual sample means
Sum of Square Treatments - Answer-This is the sum of ni * the square difference
between the sample means of the individual samples - the overall mean
Within Group Variability - Answer-This is the variability within each group. It is
calculated by: