IT 350 Artificial Intelligence & Machine Learning
Comprehensive Final Exam (Qns & Ans)
2025
Question 1 (Multiple Choice)
Question:
Which optimization algorithm adapts its learning rates based on
estimates of first and second moments of gradients, making it
especially effective in training deep neural networks?
A) Stochastic Gradient Descent (SGD)
B) RMSProp
C) Adam
D) Adagrad
Correct ANS:
C) Adam
©2025
, Rationale:
Adam (Adaptive Moment Estimation) calculates individual
adaptive learning rates for each parameter by using estimates of
the first (mean) and second (uncentered variance) moments of the
gradients. This method results in improved convergence on deep
learning tasks compared to standard SGD or even RMSProp.
---
Question 2 (Fill in the Blank)
Question:
The process of leveraging a pre-trained model on one task and
adapting it to a related but different task is known as ________ .
Correct ANS:
transfer learning
Rationale:
Transfer learning uses the knowledge gained from a source task to
improve learning in a target task. This technique is especially
useful when labeled data for the target task is limited and is
commonly applied in deep learning for computer vision and NLP.
©2025
, ---
Question 3 (True/False)
Question:
True/False: Transformers rely purely on self-attention
mechanisms and completely abandon recurrence and convolution
in order to capture long-range dependencies in data.
Correct ANS:
True
Rationale:
Transformer architectures are built entirely on self-attention
mechanisms, allowing them to model relationships between all
input positions without using recurrent or convolutional layers.
This design enables them to efficiently capture long-range
dependencies with higher parallelism during training.
---
Question 4 (Multiple Response)
©2025
Comprehensive Final Exam (Qns & Ans)
2025
Question 1 (Multiple Choice)
Question:
Which optimization algorithm adapts its learning rates based on
estimates of first and second moments of gradients, making it
especially effective in training deep neural networks?
A) Stochastic Gradient Descent (SGD)
B) RMSProp
C) Adam
D) Adagrad
Correct ANS:
C) Adam
©2025
, Rationale:
Adam (Adaptive Moment Estimation) calculates individual
adaptive learning rates for each parameter by using estimates of
the first (mean) and second (uncentered variance) moments of the
gradients. This method results in improved convergence on deep
learning tasks compared to standard SGD or even RMSProp.
---
Question 2 (Fill in the Blank)
Question:
The process of leveraging a pre-trained model on one task and
adapting it to a related but different task is known as ________ .
Correct ANS:
transfer learning
Rationale:
Transfer learning uses the knowledge gained from a source task to
improve learning in a target task. This technique is especially
useful when labeled data for the target task is limited and is
commonly applied in deep learning for computer vision and NLP.
©2025
, ---
Question 3 (True/False)
Question:
True/False: Transformers rely purely on self-attention
mechanisms and completely abandon recurrence and convolution
in order to capture long-range dependencies in data.
Correct ANS:
True
Rationale:
Transformer architectures are built entirely on self-attention
mechanisms, allowing them to model relationships between all
input positions without using recurrent or convolutional layers.
This design enables them to efficiently capture long-range
dependencies with higher parallelism during training.
---
Question 4 (Multiple Response)
©2025