ARTIBA Artificial Intelligence Engineer Certification Practice Exam
latest questions with verified answer | pdf
1. In a Linear Regression model, which metric is most sensitive to outliers?
A) Mean Absolute Error (MAE)
B) Mean Squared Error (MSE)
C) Median Absolute Deviation
D) R-Squared
Answer: B
MSE squares the residuals, meaning large errors (outliers) have an exponentially larger
impact on the loss function than small errors. Scikit-Learn Regression Metrics
2. Which activation function is most likely to suffer from the "Dying ReLU"
problem?
A) Sigmoid
B) Tanh
C) ReLU (Rectified Linear Unit)
D) Leaky ReLU
Answer: C
ReLU outputs zero for any negative input. If a large gradient pushes weights such that
the neuron always outputs zero, it stays "dead" during training.
3. What is the primary purpose of "Regularization" (L1/L2) in Machine Learning?
A) To speed up training time
B) To reduce bias in the model
C) To prevent overfitting by penalizing large weights
D) To increase the number of features
Answer: C
Regularization adds a penalty term to the loss function to keep weights small, helping
the model generalize better to unseen data.
4. In Deep Learning, "Backpropagation" is essentially an application of which
mathematical rule?
A) The Power Rule
B) The Chain Rule
C) The Product Rule
D) L'Hôpital's Rule
Answer: B
, ARTIBA Artificial Intelligence Engineer Certification Practice Exam
latest questions with verified answer | pdf
Backpropagation calculates the gradient of the loss function with respect to each weight
by traversing backward through the layers using the Chain Rule of
calculus. DeepLearning.ai Fundamentals
5. Which of the following is an "Unsupervised Learning" task?
A) Predicting house prices based on square footage
B) Classifying emails as Spam or Not Spam
C) Segmenting customers into groups based on purchasing behavior
D) Identifying handwritten digits (MNIST)
Answer: C
Customer segmentation (Clustering) involves finding hidden patterns in unlabeled data,
which is the definition of Unsupervised Learning.
6. What does the "Kernel Trick" allow a Support Vector Machine (SVM) to do?
A) Reduce the number of support vectors
B) Solve non-linear classification problems in a higher-dimensional space
C) Speed up the training of Decision Trees
D) Automatically clean missing data
Answer: B
The Kernel Trick maps data into a higher dimension where a linear hyperplane can
separate classes that are not linearly separable in the original space. ARTIBA AIE
Framework
7. In a Convolutional Neural Network (CNN), what is the function of a "Pooling
Layer"?
A) To increase the number of parameters
B) To reduce the spatial dimensions (width/height) of the input volume
C) To apply the activation function
D) To flatten the image into a 1D vector
Answer: B
Pooling (e.g., Max Pooling) reduces the computational load and helps extract dominant
features that are invariant to small shifts.
8. Which evaluation metric is best suited for a highly imbalanced dataset (e.g., 99%
Class A, 1% Class B)?
A) Accuracy
, ARTIBA Artificial Intelligence Engineer Certification Practice Exam
latest questions with verified answer | pdf
B) F1-Score
C) R-Squared
D) Mean Squared Error
Answer: B
Accuracy is misleading in imbalanced sets. F1-Score provides a balance between
Precision and Recall, focusing on the performance of the minority class.
9. What is the difference between "Bagging" and "Boosting"?
A) Bagging trains models sequentially; Boosting trains them in parallel.
B) Bagging reduces variance (e.g., Random Forest); Boosting reduces bias (e.g.,
XGBoost).
C) Bagging is for regression; Boosting is for classification.
D) There is no difference.
Answer: B
Bagging (Bootstrap Aggregating) averages independent models to reduce variance.
Boosting builds models sequentially, with each new model correcting the errors of the
previous ones.
10. In Natural Language Processing (NLP), what is "Stemming"?
A) Converting a word to its dictionary base form (lemma)
B) Removing stop words like "the" and "is"
C) Reducing a word to its root by chopping off affixes (e.g., "running" to "run")
D) Predicting the next word in a sentence
Answer: C
Stemming is a crude heuristic process that chops off the ends of words, whereas
Lemmatization uses vocabulary and morphological analysis. NLTK Documentation
11. In Gradient Descent, what happens if the "Learning Rate" (
) is set too high?
A) The model will take too long to converge.
B) The loss function may overshoot the global minimum and fail to converge.
C) The model will automatically switch to a Stochastic approach.
, ARTIBA Artificial Intelligence Engineer Certification Practice Exam
latest questions with verified answer | pdf
D) The weights will be regularized using L2.
Answer: B
An excessively large learning rate causes the weight updates to jump across the
"valley" of the loss function, potentially increasing the error instead of decreasing
it. ARTIBA AIE™ Framework
12. Which technique is specifically used to address "Internal Covariate Shift"
in Deep Neural Networks?
A) Dropout
B) Xavier Initialization
C) Batch Normalization
D) Gradient Clipping
Answer: C
Batch Normalization standardizes the inputs to each layer for each mini-batch, allowing
for higher learning rates and faster convergence by reducing internal covariate
shift. Google Research: Batch Norm
13. When using an LSTM (Long Short-Term Memory) network, what is the primary
function of the "Forget Gate"?
A) To store new information into the cell state.
B) To decide which information from the previous cell state should be discarded.
C) To output the final hidden state to the next layer.
D) To calculate the gradient for backpropagation.
Answer: B
The forget gate uses a sigmoid layer to output a number between 0 and 1, determining
how much of the previous long-term memory to keep.
14.In a Random Forest model, increasing the number of trees (
) generally leads to:
A) Higher bias
B) Significant overfitting
C) Reduced variance without increasing bias
D) Faster training times
latest questions with verified answer | pdf
1. In a Linear Regression model, which metric is most sensitive to outliers?
A) Mean Absolute Error (MAE)
B) Mean Squared Error (MSE)
C) Median Absolute Deviation
D) R-Squared
Answer: B
MSE squares the residuals, meaning large errors (outliers) have an exponentially larger
impact on the loss function than small errors. Scikit-Learn Regression Metrics
2. Which activation function is most likely to suffer from the "Dying ReLU"
problem?
A) Sigmoid
B) Tanh
C) ReLU (Rectified Linear Unit)
D) Leaky ReLU
Answer: C
ReLU outputs zero for any negative input. If a large gradient pushes weights such that
the neuron always outputs zero, it stays "dead" during training.
3. What is the primary purpose of "Regularization" (L1/L2) in Machine Learning?
A) To speed up training time
B) To reduce bias in the model
C) To prevent overfitting by penalizing large weights
D) To increase the number of features
Answer: C
Regularization adds a penalty term to the loss function to keep weights small, helping
the model generalize better to unseen data.
4. In Deep Learning, "Backpropagation" is essentially an application of which
mathematical rule?
A) The Power Rule
B) The Chain Rule
C) The Product Rule
D) L'Hôpital's Rule
Answer: B
, ARTIBA Artificial Intelligence Engineer Certification Practice Exam
latest questions with verified answer | pdf
Backpropagation calculates the gradient of the loss function with respect to each weight
by traversing backward through the layers using the Chain Rule of
calculus. DeepLearning.ai Fundamentals
5. Which of the following is an "Unsupervised Learning" task?
A) Predicting house prices based on square footage
B) Classifying emails as Spam or Not Spam
C) Segmenting customers into groups based on purchasing behavior
D) Identifying handwritten digits (MNIST)
Answer: C
Customer segmentation (Clustering) involves finding hidden patterns in unlabeled data,
which is the definition of Unsupervised Learning.
6. What does the "Kernel Trick" allow a Support Vector Machine (SVM) to do?
A) Reduce the number of support vectors
B) Solve non-linear classification problems in a higher-dimensional space
C) Speed up the training of Decision Trees
D) Automatically clean missing data
Answer: B
The Kernel Trick maps data into a higher dimension where a linear hyperplane can
separate classes that are not linearly separable in the original space. ARTIBA AIE
Framework
7. In a Convolutional Neural Network (CNN), what is the function of a "Pooling
Layer"?
A) To increase the number of parameters
B) To reduce the spatial dimensions (width/height) of the input volume
C) To apply the activation function
D) To flatten the image into a 1D vector
Answer: B
Pooling (e.g., Max Pooling) reduces the computational load and helps extract dominant
features that are invariant to small shifts.
8. Which evaluation metric is best suited for a highly imbalanced dataset (e.g., 99%
Class A, 1% Class B)?
A) Accuracy
, ARTIBA Artificial Intelligence Engineer Certification Practice Exam
latest questions with verified answer | pdf
B) F1-Score
C) R-Squared
D) Mean Squared Error
Answer: B
Accuracy is misleading in imbalanced sets. F1-Score provides a balance between
Precision and Recall, focusing on the performance of the minority class.
9. What is the difference between "Bagging" and "Boosting"?
A) Bagging trains models sequentially; Boosting trains them in parallel.
B) Bagging reduces variance (e.g., Random Forest); Boosting reduces bias (e.g.,
XGBoost).
C) Bagging is for regression; Boosting is for classification.
D) There is no difference.
Answer: B
Bagging (Bootstrap Aggregating) averages independent models to reduce variance.
Boosting builds models sequentially, with each new model correcting the errors of the
previous ones.
10. In Natural Language Processing (NLP), what is "Stemming"?
A) Converting a word to its dictionary base form (lemma)
B) Removing stop words like "the" and "is"
C) Reducing a word to its root by chopping off affixes (e.g., "running" to "run")
D) Predicting the next word in a sentence
Answer: C
Stemming is a crude heuristic process that chops off the ends of words, whereas
Lemmatization uses vocabulary and morphological analysis. NLTK Documentation
11. In Gradient Descent, what happens if the "Learning Rate" (
) is set too high?
A) The model will take too long to converge.
B) The loss function may overshoot the global minimum and fail to converge.
C) The model will automatically switch to a Stochastic approach.
, ARTIBA Artificial Intelligence Engineer Certification Practice Exam
latest questions with verified answer | pdf
D) The weights will be regularized using L2.
Answer: B
An excessively large learning rate causes the weight updates to jump across the
"valley" of the loss function, potentially increasing the error instead of decreasing
it. ARTIBA AIE™ Framework
12. Which technique is specifically used to address "Internal Covariate Shift"
in Deep Neural Networks?
A) Dropout
B) Xavier Initialization
C) Batch Normalization
D) Gradient Clipping
Answer: C
Batch Normalization standardizes the inputs to each layer for each mini-batch, allowing
for higher learning rates and faster convergence by reducing internal covariate
shift. Google Research: Batch Norm
13. When using an LSTM (Long Short-Term Memory) network, what is the primary
function of the "Forget Gate"?
A) To store new information into the cell state.
B) To decide which information from the previous cell state should be discarded.
C) To output the final hidden state to the next layer.
D) To calculate the gradient for backpropagation.
Answer: B
The forget gate uses a sigmoid layer to output a number between 0 and 1, determining
how much of the previous long-term memory to keep.
14.In a Random Forest model, increasing the number of trees (
) generally leads to:
A) Higher bias
B) Significant overfitting
C) Reduced variance without increasing bias
D) Faster training times