CSIS 657
Statistical Analysis & Data Mining
COMPLETE MIDTERM REVIEW
© 2024/2025
,1. Multiple Choice: Which of the following is a key assumption of
linear regression analysis?
a) Homoscedasticity
b) Heteroscedasticity
c) Multicollinearity
d) Autocorrelation
Correct Answer: a) Homoscedasticity
2. Fill-in-the-Blank: In a dataset, the presence of ________ can
significantly affect the performance of a data mining algorithm.
Correct Answer: outliers
3. True/False: Principal Component Analysis (PCA) is a supervised
learning technique.
Correct Answer: False
4. Multiple Response: Which of the following are benefits of using
a decision tree for data analysis? (Select all that apply)
a) Easy to interpret and explain
b) Requires little data preprocessing
© 2024/2025
, c) Non-parametric method
d) Invariant to feature scaling
Correct Answers: a) Easy to interpret and explain, b) Requires
little data preprocessing, c) Non-parametric method
5. Multiple Choice: In the context of data mining, 'support' refers
to:
a) The number of times a rule is found to be true
b) The probability of finding a certain pattern in the dataset
c) The reliability of inferred rules
d) The inverse of the error rate
Correct Answer: b) The probability of finding a certain pattern
in the dataset
6. Fill-in-the-Blank: ________ is a measure of the strength of
association between two variables.
Correct Answer: Correlation
7. True/False: Overfitting refers to a model that captures the noise
of the data rather than the underlying pattern.
Correct Answer: True
© 2024/2025
Statistical Analysis & Data Mining
COMPLETE MIDTERM REVIEW
© 2024/2025
,1. Multiple Choice: Which of the following is a key assumption of
linear regression analysis?
a) Homoscedasticity
b) Heteroscedasticity
c) Multicollinearity
d) Autocorrelation
Correct Answer: a) Homoscedasticity
2. Fill-in-the-Blank: In a dataset, the presence of ________ can
significantly affect the performance of a data mining algorithm.
Correct Answer: outliers
3. True/False: Principal Component Analysis (PCA) is a supervised
learning technique.
Correct Answer: False
4. Multiple Response: Which of the following are benefits of using
a decision tree for data analysis? (Select all that apply)
a) Easy to interpret and explain
b) Requires little data preprocessing
© 2024/2025
, c) Non-parametric method
d) Invariant to feature scaling
Correct Answers: a) Easy to interpret and explain, b) Requires
little data preprocessing, c) Non-parametric method
5. Multiple Choice: In the context of data mining, 'support' refers
to:
a) The number of times a rule is found to be true
b) The probability of finding a certain pattern in the dataset
c) The reliability of inferred rules
d) The inverse of the error rate
Correct Answer: b) The probability of finding a certain pattern
in the dataset
6. Fill-in-the-Blank: ________ is a measure of the strength of
association between two variables.
Correct Answer: Correlation
7. True/False: Overfitting refers to a model that captures the noise
of the data rather than the underlying pattern.
Correct Answer: True
© 2024/2025