Actual Exam (100 Questions with Answers and
Rationales)
Overview:
This exam is a full-length, 100-question practice test designed to evaluate a student’s
understanding of graduate-level deep learning concepts. It covers core topics including:
Neural Network Architectures: Feedforward (MLP), Convolutional (CNN), Recurrent
(RNN), LSTM, and GRU networks.
Activation Functions: ReLU, Leaky ReLU, Sigmoid, Tanh, and Softmax, including
their properties and derivatives.
Optimization Methods: SGD, Momentum, RMSProp, Adam, learning rate adjustments,
and adaptive optimization techniques.
Regularization and Stabilization: Dropout, L1/L2 weight penalties, Batch
Normalization, weight initialization (Xavier/He), and gradient clipping.
Gradient Challenges: Vanishing and exploding gradients, and techniques such as
residual connections to address them.
Loss Functions: Cross-entropy, MSE, and Hinge loss for classification and regression
tasks.
Practical Applications: Forward/backward pass calculations, parameter counting, and
PyTorch implementation examples.
Each question includes answers in bold and rationales, making this practice exam an excellent
tool for reviewing concepts, identifying knowledge gaps, and preparing for real assessments in
CS7643 Deep Learning.
1. Which of the following is NOT a property of ReLU?
A. Non-linear
B. Unbounded above
C. Smooth and differentiable everywhere
D. Encourages sparsity
Rationale: ReLU is non-linear and unbounded above; it zeros out negatives
(sparse activations) but is not differentiable at 0.
,2. In batch normalization, the γ and β parameters are used to:
A. Normalize inputs to zero mean and unit variance
B. Scale and shift normalized inputs
C. Reduce overfitting directly
D. Speed up gradient computation
Rationale: γ and β allow the network to restore representation flexibility after
normalization.
3. Adam optimizer combines:
A. SGD and momentum
B. RMSProp only
C. Momentum and adaptive learning rates
D. L2 regularization
Rationale: Adam uses first-moment (momentum) and second-moment (RMSProp-
like) estimates.
4. Dropout primarily helps:
A. Accelerate training
B. Reduce overfitting
C. Improve ReLU performance
D. Initialize weights
Rationale: Randomly zeroes activations during training to prevent co-adaptation.
5. The standard loss for multi-class classification is:
A. MSE
B. Cross-entropy
C. Hinge loss
D. KL divergence
Rationale: Cross-entropy compares predicted probabilities with one-hot labels.
, 6. Vanishing gradients in RNNs lead to:
A. Faster training
B. Inability to learn long-term dependencies
C. Overfitting
D. Weight explosion
Rationale: Multiplying many small derivatives reduces gradient magnitude over
time.
7. Key advantage of LSTM over vanilla RNN:
A. Faster
B. Fewer parameters
C. Capture long-term dependencies
D. Simpler architecture
Rationale: LSTM gates preserve gradients and enable learning across long
sequences.
8. Sigmoid activation is used in:
A. Hidden layers of CNNs
B. ReLU replacement
C. Binary classification output
D. Softmax replacement
Rationale: Sigmoid maps output to (0,1) for probability interpretation.
9. Xavier initialization aims to:
A. Set all weights to zero
B. Keep activations’ variance stable across layers
C. Prevent overfitting
D. Speed up ReLU convergence
Rationale: Balances signal variance for both forward and backward passes.