2026/2027 Update | 100% Correct.
Introduction to Analytics Modeling | Key Domains: Supervised Learning (Regression, Classification),
Unsupervised Learning (Clustering, Dimensionality Reduction), Time Series Analysis & Forecasting,
Design of Experiments, Model Validation & Selection, Data Preprocessing, and Analytics Application
in Business & Engineering Contexts | Expert-Aligned Structure | Quiz-Ready Format
Introduction
This structured ISYE 6501 Final Quiz for 2026/2027 provides a focused set of high-quality
analytical modeling questions with correct answers and rationales. It emphasizes the selection,
application, and interpretation of appropriate analytics models for real-world data, understanding
the assumptions and limitations of each technique, and evaluating model performance to support
data-driven decision-making.
Quiz Structure:
• Final Quiz: (40 QUESTIONS)
Answer Format
All correct answers must appear in bold and cyan blue, accompanied by concise rationales
explaining why a specific model is most appropriate for the given data/scenario, how to interpret
key output metrics (R-squared, p-value, confusion matrix, elbow plot), the correct validation
approach, and why alternative model choices or interpretations are statistically or conceptually
flawed.
1. A data scientist is modeling house prices (continuous) using square footage, number of
bedrooms, and location (categorical). Which technique is most appropriate?
A. Logistic regression
B. K-means clustering
C. Linear regression
D. Principal component analysis (PCA)
,C. Linear regression
Linear regression models the relationship between a continuous dependent variable (house price) and
one or more predictors (including categorical via dummy variables). Logistic regression (A) is for
binary outcomes. K-means (B) and PCA (D) are unsupervised and do not predict a target variable.
2. In k-fold cross-validation, what is the primary purpose of averaging performance across
folds?
A. To reduce model bias
B. To obtain a more robust estimate of out-of-sample error
C. To increase training data size
D. To select the best hyperparameter k for KNN
B. To obtain a more robust estimate of out-of-sample error
K-fold CV provides a less variable estimate of model performance by training and validating on
multiple subsets, reducing the risk of overfitting to a single train/test split. It does not reduce bias (A)
or increase data size (C). The “k” in k-fold is unrelated to KNN’s k (D).
3. A confusion matrix for a binary classifier shows 90 true positives, 5 false positives, 3 false
negatives, and 102 true negatives. What is the recall (sensitivity)?
A. 90 / (90 + 5) = 0.947
B. 90 / (90 + 3) = 0.968
C. 102 / (102 + 5) = 0.953
D. (90 + 102) / (90 + 5 + 3 + 102) = 0.96
B. 90 / (90 + 3) = 0.968
Recall = TP / (TP + FN) = 90 / (90 + 3) ≈ 0.968. It measures the model’s ability to identify all actual
positives. Option A is precision. Option C is specificity. Option D is accuracy.
, 4. When applying PCA, what does the first principal component represent?
A. The direction of least variance in the data
B. The eigenvector with the smallest eigenvalue
C. The direction of maximum variance in the data
D. The mean-centered data vector
C. The direction of maximum variance in the data
PCA identifies orthogonal axes (principal components) that capture maximum variance. The first PC
corresponds to the eigenvector of the covariance matrix with the largest eigenvalue, representing the
direction of greatest data spread.
5. A time series exhibits a clear upward trend and annual seasonality. Which model is most
appropriate for forecasting?
A. Simple exponential smoothing
B. Holt-Winters exponential smoothing
C. ARIMA(0,0,0)
D. K-means clustering
B. Holt-Winters exponential smoothing
Holt-Winters (triple exponential smoothing) explicitly models trend and seasonality. Simple
exponential smoothing (A) only handles level. ARIMA(0,0,0) (C) is white noise. K-means (D) is
unsupervised and not for forecasting.
6. In designing an experiment to test three factors (each at two levels), a full factorial design
requires how many experimental runs?
A. 6