100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Exam (elaborations)

BANA 4080 DATA MINING EXAM 2 QUESTIONS WITH VERIFIED ANSWERS

Rating
-
Sold
-
Pages
5
Uploaded on
26-03-2025
Written in
2024/2025

BANA 4080 DATA MINING EXAM 2 QUESTIONS WITH VERIFIED ANSWERS

Institution
DATA MINING
Course
DATA MINING









Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
DATA MINING
Course
DATA MINING

Document information

Uploaded on
March 26, 2025
Number of pages
5
Written in
2024/2025
Type
Exam (elaborations)
Contains
Unknown

Subjects

Content preview

BANA 4080 DATA MINING EXAM 2
QUESTIONS WITH VERIFIED
ANSWERS
Stepwise Regression - Answer-•Stepwise regression is similar to forward selection,
except at each step, the algorithm also considers dropping predictors that are not
statistically significant, as in backward elimination.

Ordinary Least Squares (OLS) - Answer-a method to estimate the coefficients of the
regression formula. •OLS finds values β ̂_0,β ̂_1,β ̂_2,...,β _
̂ p that minimize the sum
of squared deviations between the actual values Y and their predicted values (Y ̂)
based on the model.

Selecting Subsets of Predictors - Answer-Goal: Find parsimonious model (the
simplest model that performs sufficiently well)
◦More robust
◦Higher predictive accuracy

◦We will assess predictive accuracy on validation data
◦Exhaustive search = "best subset"
◦Partial search algorithms
Forward
Backward
Stepwise

Mallow's Cp for the best model with 1 predictor, 2 predictors, etc. (exhaustive
search) - Answer-The Cp approaches p + 1 as the number of predictors approaches
p. Here you would choose the model with 9 predictors since it has the smallest Cp.

Quiz 7 Answers with Explanation - Answer-

Refer to the correlation matrix for the Boston Housing data below.
Based on this correlation matrix, what would you conclude is the best single
predictor for median home value? - Answer-We know that any predictor is good for
prediction if there is a significant correlation between the predictor variable and the
response variable. This correlation must be large in magnitude may be negative or
positive. To find the best single predictor Median Home Value (MEDV) we need to
find the most correlated variable with (MEDV) from the given correlation matrix. It
may be negative or positive but it must be large in absolute magnitude. The larger
the absolute value of the coefficient (the size of the number without regard to the
sign) the greater the magnitude of the relationship. From the given correlation matrix,
we can see that the correlation between MEDV and LSTAT is the highest in absolute
magnitude. Thus, we can say that LSTAT is the best single predictor for MEDV.

Refer to the below linear regression summary for predicting wine quality. Which
predictors are significant at a 95% confidence level? (Select all that apply.) - Answer-
Predictor are significant at 95% confidence level if

, -p-value corresponding to t-statistics of predictor is less than 0.05 level of
significance (OR)
- critical t-value is smaller than |t-statistics|
critical t-value of 3917 (11+3906) degree of freedom and 0.05 level of significance
for two-tailed is 1.9606 (calculated using EXCEL function T.INV.2T(0.05,3917)).
Predictor are significant at 95% confidence level : sulphates, alcohol, residual sugar,
volatile acidity, free sulphur dioxide, density, fixed acidity, pH

Quiz 8 Answers with Explanation - Answer-

Below are the Mallow's Cp values that result from performing an exhaustive search
for predicting wine quality.

Based on this information, how many predictors are in the best linear model? -
Answer-8, in Mallow's Cp you are looking for the predictor that is the lowest value.

Below shows the status of a forward selection at a specific point during the process.
What is the next step the algorithm will take in order to improve the model? -
Answer-The forward selection model begins with no predictors and adds predictors
one at a time. The next step the algorithm will take is to add density to the model.
When the contribution of additional predictors is no longer statistically significant, the
algorithm stops. This happens when the algorithm reaches none.

Practical reasons for predictor elimination: - Answer-•Difficulty in collecting data in
the future
•Inaccuracies
•High correlation with other predictor(s)
•Many missing values
•Irrelevant variable

Exhaustive Search - Answer-•Evaluates all subsets of predictors to determine the
best model.
•Even with small number of predictors, the number of subsets is very large.
•If there are k predictors, then there are 2k possible subsets.
•For example, in the subset of predictors we used in the Toyota Corolla linear
regression example, there were 11 predictors.
•Among those 11 variables, there are 211 = 2,048 possible subsets of variables!
•Beware: This is computationally intensive, not feasible for big data.

Adjusted R^2 - Answer-•Like R^2, higher values of adjusted R^2 indicate a better fit.
•Unlike R^2, the adjusted R^2 penalizes a model for its number of predictors (the
purpose of the fractional portion of the equation).

Mallow's Cp - Answer-•A smaller C_p is good.

Akaike Information Criterion (AIC) - Answer-•Measures the goodness of fit of a
model but also includes a penalty that is a function of the number of parameters in
the model
•Allows you to compare multiple models for the same data set
•Smaller AIC is better

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
biggdreamer Havard School
View profile
Follow You need to be logged in order to follow users or courses
Sold
247
Member since
2 year
Number of followers
68
Documents
17943
Last sold
1 week ago

4.0

38 reviews

5
22
4
4
3
6
2
2
1
4

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can immediately select a different document that better matches what you need.

Pay how you prefer, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card or EFT and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions