Questions and Answers | 2026 Updates | 100% Correct.
**CS 7643 – Deep Learning
Quiz 1:
Coverage Areas
Neural network fundamentals
Linear algebra & calculus for deep learning
Loss functions
Optimization & gradient descent
Activation functions
Overfitting & generalization
Bias–variance tradeoff
Section A: Multiple Choice
1. What is the primary role of an activation function in a neural network?
A. To reduce overfitting
B. To introduce non-linearity
C. To normalize inputs
D. To minimize loss
Answer: B
Explanation: Without non-linearity, a neural network collapses into a linear model regardless of depth.
2. Which activation function is most likely to suffer from the vanishing gradient problem?
A. ReLU
B. Leaky ReLU
C. Sigmoid
D. ELU
Answer: C
Explanation: Sigmoid saturates at extreme values, causing gradients to approach zero.
3. What does the gradient of the loss function represent?
,A. The value of the loss
B. The curvature of the loss surface
C. The direction of steepest increase in loss
D. The direction of steepest decrease in loss
Answer: C
Explanation: Gradient points in the direction of maximum increase; optimization moves in the opposite
direction.
4. Which of the following best describes stochastic gradient descent (SGD)?
A. Uses the entire dataset for each update
B. Uses one data point per update
C. Uses no gradients
D. Uses second-order derivatives
Answer: B
5. Increasing model capacity generally increases which risk?
A. Bias
B. Underfitting
C. Overfitting
D. Variance reduction
Answer: C
Section B: Short Answer
6. Why can deep networks approximate complex functions better than shallow networks?
Answer:
Because depth allows the composition of multiple non-linear transformations, enabling hierarchical
feature learning and more efficient representation of complex functions.
7. What is the bias–variance tradeoff?
Answer:
Bias measures error from overly simplistic assumptions, while variance measures sensitivity to training
data. Increasing model complexity reduces bias but increases variance.
8. Why is ReLU preferred over sigmoid in deep networks?
,Answer:
ReLU avoids saturation for positive values, mitigates vanishing gradients, and is computationally
efficient.
9. What happens if the learning rate is too large?
Answer:
The optimization may diverge or oscillate, failing to converge to a minimum.
10. What is overfitting, and how can it be reduced?
Answer:
Overfitting occurs when a model fits training data too closely and generalizes poorly. It can be reduced
using regularization, dropout, early stopping, and more data.
Section C: Mathematical Foundations
11. Given loss ( L = (y - \hat{y})^2 ), compute ( \frac{dL}{d\hat{y}} ).
Answer:
[
\frac{dL}{d\hat{y}} = -2(y - \hat{y})
]
12. What is the gradient descent update rule?
Answer:
[
\theta_{t+1} = \theta_t - \eta \nabla L(\theta_t)
]
Where ( \eta ) is the learning rate.
13. What does a zero gradient indicate at a point in optimization?
Answer:
The point may be a local minimum, local maximum, or saddle point.
14. Why are saddle points problematic in deep learning?
, Answer:
They can trap optimization algorithms because gradients are near zero, slowing learning.
Section D: Conceptual / Scenario-Based
15. A model performs well on training data but poorly on validation data. What is happening?
Answer:
The model is overfitting.
16. You notice training loss oscillates wildly. What is the most likely cause?
Answer:
The learning rate is too high.
17. Why is normalization often applied to input features?
Answer:
To stabilize gradients, speed up convergence, and improve numerical stability.
18. Why do we initialize weights randomly instead of setting them to zero?
Answer:
Zero initialization causes neurons to learn identical features due to symmetry.
Section E: True / False
19. Deeper networks always outperform shallow networks.
Answer: False
20. SGD introduces noise that can help escape local minima.
Answer: True
Below is a high-fidelity practice Quiz 2 for Georgia Tech CS 7643 – Deep Learning, aligned with how the
course typically escalates difficulty after Quiz 1.
**CS 7643 – Deep Learning