100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Exam (elaborations)

DSCI 4520 EXAM 1 SECTION 2 QUESTIONS WITH COMPLETE SOLUTIONS

Rating
-
Sold
-
Pages
8
Uploaded on
09-03-2025
Written in
2024/2025

DSCI 4520 EXAM 1 SECTION 2 QUESTIONS WITH COMPLETE SOLUTIONS

Institution
WSS
Course
WSS









Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
WSS
Course
WSS

Document information

Uploaded on
March 9, 2025
Number of pages
8
Written in
2024/2025
Type
Exam (elaborations)
Contains
Unknown

Subjects

Content preview

DSCI 4520 EXAM 1 SECTION 2
QUESTIONS WITH COMPLETE
SOLUTIONS
Which statement is INCORRECT about choosing the number of clusters in the k-means
clustering method?
A. Maximizing the within-cluster sums of squared errors (WSS) is the goal when
selecting k
B. Sometimes business considerations impose constrains on the value of k
C. Ability to do a useful profiling based on the cluster centroids helps us select a right
value of k
D. Similar analyses can be used to inform our decision about a right value of k -
Answer-Maximizing the within-cluster sums of squared errors (WSS) is the goal when
selecting k

k-nearest neighbor (k-NN) is a supervised method that can be used for predicting
categorical or numerical targets.
True
False - Answer-True

In the k-nearest neighbor models, increasing the value of k leads to overfitting.
True
False - Answer-False

With the k-NN model for a numerical target, after we determined the k nearest
neighbors of a new data record, how the target value is predicted?
A. Majority vote determines the predicted class
B. Average of the neighbors
C. Through a logistic regression between the neighbors
D. Through a linear combination of neighbors - Answer-Average of the neighbors

What statement is correct about the k-nearest neighbor (k-NN) method?
A. Underfitted k-NN models can be fixed by adding a dummy variable for accuracy
B. Logistic regression is a special case of k-NN
C. The value of k can control model over and underfitting
D. Overfitted k-NN models can be fixed by decreasing k - Answer-The value of k can
control model over and underfitting

Which statement is INCORRECT about k-NN predictive models?
A. Larger values of k increase the risk of over-fitting
B. When k=n (number of data records) the k-NN and the universal average methods are
the same
C. k-NN is sensitive to irrelevant features

, D. Finding optimum value of k can be computationally expensive - Answer-Larger
values of k increase the risk of over-fitting

When we are building a linear regression model, against what model do we compare it
to evaluate its significance?
Naïve (average) model
Logistic model
Classification model
Random model - Answer-Naïve (average) model

In a linear regression model, the t-Test for each predictor's coefficient indicates if the
estimated value is significantly different from zero.
True
False - Answer-True

In the development of a linear regression model, what is the naive (based) model that
we compare the performance of the linear model with?
Simple linear model
Average model
Multiple linear model
Random guess - Answer-Average model

In the following scatter plot matrix, Price is the target variable. What predictor shows the
strongest negative correlation with Price?
CC
HP
Age_08_04
Weight - Answer-Age_08_04

The following report shows Excel output for a linear regression model. What can the p-
value of F-statistic tell us?
A. If this p-value is less than our significance level then the coefficients are significant
B. If this p-value is larger than our significance level then the coefficients are significant
C. If this p-value is larger than our significance level then the model as a whole is
significant
D. If this p-value is less than our significance level then the model as a whole is
significant - Answer-If this p-value is less than our significance level then the model as a
whole is significant

We have developed two different linear regression models on the same data set. Which
model shows a better goodness-of-fit?
Not enough information
Models are the same
Model B
Model A - Answer-Model A

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
biggdreamer Havard School
View profile
Follow You need to be logged in order to follow users or courses
Sold
248
Member since
2 year
Number of followers
68
Documents
17956
Last sold
1 week ago

4.0

38 reviews

5
22
4
4
3
6
2
2
1
4

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions