CERTIFICATION EXAM
174 QUESTIONS AND ANSWERS
1. What is the difference between supervised and unsupervised learning?
• Answer: Supervised learning uses labeled data where the
algorithm learns to map inputs to known outputs, while
unsupervised learning works with unlabeled data to identify
patterns or structures without predefined outputs.
2. What is the bias-variance tradeoff?
• Answer: The bias-variance tradeoff is the balance between a
model's ability to fit the training data (low bias) and its ability
to generalize to new data (low variance). High-complexity
models tend to have low bias but high variance, while simpler
models have higher bias but lower variance.
3. What is overfitting in machine learning?
• Answer: Overfitting occurs when a model learns the training
data too well, including its noise and outliers, resulting in poor
performance on unseen data. The model essentially memorizes
the training examples rather than learning generalizable
patterns.
4. How does regularization help prevent overfitting?
• Answer: Regularization adds a penalty term to the loss
function that discourages complex models by constraining
parameter values. This reduces model variance and improves
generalization to new data by preventing the model from
fitting noise in the training data.
5. What is cross-validation and why is it important?
https://www.stuvia.com/user/Mboffin
, • Answer: Cross-validation is a technique where the dataset is
split into multiple subsets, with different parts used for training
and validation in iterations. It's important because it provides a
more reliable estimate of
https://www.stuvia.com/user/Mboffin
, model performance on unseen data compared to a single train-test
split, helping detect overfitting.
6. Explain the difference between bagging and boosting.
• Answer: Bagging (Bootstrap Aggregating) trains multiple
models in parallel on random subsets of data and averages their
predictions to reduce variance. Boosting trains models
sequentially, with each model focusing on examples previous
models performed poorly on, combining them with weighted
voting to reduce bias.
7. What is the curse of dimensionality?
• Answer: The curse of dimensionality refers to various challenges
that arise when analyzing data in high-dimensional spaces. As
dimensions increase, data becomes sparse, distances between
points become less meaningful, and models require
exponentially more data to generalize effectively.
8. What is the ROC curve and what does AUC represent?
• Answer: The Receiver Operating Characteristic (ROC) curve plots
the true positive rate against the false positive rate at various
classification thresholds. The Area Under the Curve (AUC)
represents the probability that the classifier will rank a
randomly chosen positive instance higher than a randomly
chosen negative one, with 1.0 being perfect classification.
9. Explain the difference between L1 and L2 regularization.
• Answer: L1 regularization (Lasso) adds the sum of the absolute
values of the coefficients to the loss function, which can drive some
coefficients to exactly zero, performing feature selection. L2
regularization (Ridge) adds the sum of squared coefficients, which
shrinks all coefficients proportionally but rarely to exactly zero.
10. What is the cold start problem in recommendation systems?
• Answer: The cold start problem occurs when a recommendation
system cannot make reliable recommendations due to insufficient
data about new users or items. Without historical interaction
data, the system struggles to identify preferences or similarities
needed for accurate recommendations.
11. What are principal components in PCA?
https://www.stuvia.com/user/Mboffin
, • Answer: Principal components are orthogonal vectors that
C C C C C C
represent directions of maximum variance in the data. They are
C C C C C C C C C C
eigenvectors of the covariance matrix, ranked by their
C C C C C C C C
corresponding eigenvalues, and form a new coordinate system
C C C C C C C C
where data dimensions are uncorrelated.
C C C C C
12. Explain the difference between a generative and discriminative
C C C C C C C
model.
C
• Answer: Generative models learn the joint probability
C C C C C C
distribution P(X,Y) to understand how data is generated,
C C C C C C C C
allowing them to create new samples. Discriminative models
C C C C C C C C
learn the conditional probability P(Y|X) to focus on decision
C C C C C C C C C
boundaries between classes for classification tasks.
C C C C C C
13. What is transfer learning and when is it useful?
C C C C C C C C
• Answer: Transfer learning is a technique where a model
C C C C C C C C
developed for one task is reused as the starting point for a
C C C C C C C C C C C C
model on a second task. It's useful when the target task has
C C C C C C C C C C C C
limited training data, when the source and target tasks share
C C C C C C C C C C
similarities, or when pre-trained models capture relevant
C C C C C C C
features that transfer well.
C C C C
14. What is the difference between batch, mini-batch, and
C C C C C C C
stochastic gradient descent?
C C C
• Answer: Batch gradient descent computes gradients using the
C C C C C C C
entire dataset in each iteration. Mini-batch uses random subsets
C C C C C C C C C
of data for each update. Stochastic gradient descent uses just
C C C C C C C C C C
one example per update. Mini-batch balances computational
C C C C C C C
efficiency with update stability, while stochastic provides the
C C C C C C C C
noisiest but most frequent updates.
C C C C C
15. What is the vanishing gradient problem?
C C C C C
• Answer: The vanishing gradient problem occurs when gradients
C C C C C C C
become extremely small as they propagate backward through
C C C C C C C C
many layers of a deep neural network. This makes it difficult to
C C C C C C C C C C C C
update weights in earlier layers, slowing or preventing learning
C C C C C C C C C
in those parts of the network.
C C C C C C
16. How does batch normalization help in training deep networks?
C C C C C C C C
• Answer: Batch normalization normalizes the inputs to each layer
C C C C C C C C
by subtracting the batch mean and dividing by batch standard
C C C C C C C C C C
https://www.stuvia.com/user/Mboffin