A. Supervised learning uses unlabeled data, unsupervised uses labeled data
B. Supervised learning learns from labeled data, unsupervised from unlabeled data
C. Supervised learning clusters data, unsupervised predicts outputs
D. Both learn from labeled outputs
Answer: B
Rationale: Supervised learning maps inputs to known outputs using labeled data, while
unsupervised learning finds patterns in unlabeled data.
2. What does the bias-variance tradeoff refer to?
A. Balancing underfitting and overfitting
B. Increasing model speed
C. Reducing training data size
D. Adjusting input variables
Answer: A
Rationale: A model with high bias underfits (too simple), and one with high variance
overfits (too complex). Good models balance both.
3. What is overfitting in machine learning?
A. The model fails to capture relationships
,B. The model performs better on test data
C. The model memorizes training data and performs poorly on new data
D. The model has no variance
Answer: C
Rationale: Overfitting happens when a model learns noise and outliers in the training data,
reducing its ability to generalize.
4. How does regularization help reduce overfitting?
A. It increases model parameters
B. It removes features
C. It penalizes large weights in the loss function
D. It trains the model faster
Answer: C
Rationale: Regularization discourages complex models by penalizing large coefficients,
improving generalization
5. Why is cross-validation important?
A. It reduces model size
B. It increases training data
C. It tests model performance on multiple data splits
D. It speeds up training
Answer: C
Rationale: Cross-validation helps assess how the model will generalize by rotating train-test
splits, detecting overfitting effectively.
6. What is the difference between bagging and boosting?
A. Bagging reduces bias; boosting reduces variance
B. Bagging is sequential; boosting is parallel
C. Bagging is parallel, boosting is sequential
D. They are the same method
,Answer: C
Rationale: Bagging runs models in parallel to reduce variance, while boosting trains models
sequentially to reduce bias.
7. What is the curse of dimensionality?
A. Data becomes easier to analyze in high dimensions
B. All distances between data points become meaningful
C. High-dimensional data becomes sparse and harder to model
D. More features improve model generalization
Answer: C
Rationale: As dimensions increase, data becomes sparse and less meaningful, making
models prone to overfitting and needing more data.
8. What does the ROC curve represent?
A. Regression accuracy vs. recall
B. Precision vs. recall
C. True positive rate vs. false positive rate
D. Sensitivity vs. specificity
Answer: C
Rationale: The ROC curve plots sensitivity (TPR) against FPR at various thresholds, and
AUC shows overall model performance.
9. How does L1 regularization differ from L2 regularization?
A. L1 uses squared coefficients, L2 uses absolute values
B. L1 drives coefficients to zero, L2 shrinks them
C. L1 increases model complexity, L2 reduces it
D. L1 applies only to neural networks
Answer: B
, Rationale: L1 (Lasso) can eliminate features by shrinking weights to zero. L2 (Ridge)
penalizes weights but keeps all features.
10. What is the cold start problem in recommender systems?
A. Items load slowly due to server issues
B. New users or items lack data for recommendations
C. Recommendations are only for popular items
D. Only collaborative filtering can be used
Answer: B
Rationale: Without historical data, it's difficult to make accurate recommendations for new
users or items.
11. What are principal components in PCA?
A. Original input features
B. Orthogonal directions maximizing variance
C. Labels for supervised learning
D. Randomly chosen dimensions
Answer: B
Rationale: Principal components are new axes (eigenvectors) capturing maximum variance
in the dataset.
12. How do generative and discriminative models differ?
A. Generative models create labels, discriminative models create inputs
B. Generative learns P(Y|X), discriminative learns P(X,Y)
C. Generative learns P(X,Y), discriminative learns P(Y|X)
D. Both models predict classes only