100-Question Real Exam with Answers &
Rationales
Overview:
This comprehensive 100-question practice exam is designed for students preparing for CS7643
– Deep Learning (Quiz 2). It focuses on the most-tested concepts, frequently searched topics,
and high-yield questions, providing a realistic preparation tool.
Key topics covered include:
Neural Network Fundamentals: MLPs, activation functions (ReLU, Sigmoid, Tanh),
weight initialization (Xavier, He), and loss functions (Cross-Entropy, MSE, Hinge).
Optimization Techniques: SGD, Momentum, RMSProp, Adam, learning rate strategies,
and gradient issues (vanishing/exploding gradients).
Convolutional Neural Networks (CNNs): Convolution, kernel/stride/padding, pooling
layers, and parameter counting.
Recurrent Neural Networks (RNNs): Vanilla RNN, LSTM, GRU, gating mechanisms,
and BPTT.
Regularization and Stabilization: Dropout, L1/L2 regularization, Batch Normalization,
and residual connections.
Practical PyTorch Concepts: Forward/backward passes, layer parameters, and
implementation best practices.
Each question includes answers in bold and rationales
1. Xavier initialization is best for which activation functions?
A. ReLU
B. Sigmoid and Tanh
C. Softmax
D. Leaky ReLU
Rationale: Xavier keeps activations’ variance stable for zero-centered activations
like sigmoid and tanh.
,2. He initialization is preferred for:
A. Sigmoid
B. ReLU
C. Tanh
D. Softmax
Rationale: He accounts for ReLU’s positive slope to maintain variance.
3. Which activation is zero-centered?
A. Sigmoid
B. Tanh
C. ReLU
D. Softmax
Rationale: Tanh outputs range (-1,1), centering the data.
4. Sigmoid output range:
A. (-1,1)
B. (0, ∞)
C. (0,1)
D. (-∞, ∞)
Rationale: Sigmoid maps any real number to (0,1), suitable for probabilities.
5. ReLU derivative for x > 0:
A. 0
B. 1
C. x
D. Undefined
Rationale: ReLU(x)=x for x>0, derivative =1.
, 6. Cross-entropy loss is used for:
A. Regression
B. Classification
C. Clustering
D. Autoencoders
Rationale: Cross-entropy measures distance between predicted probabilities and
true labels.
7. Mean Squared Error (MSE) is used for:
A. Classification
B. Regression
C. Softmax
D. Hinge loss
Rationale: MSE calculates squared difference for continuous outputs.
8. Which optimizer adapts learning rates per parameter?
A. SGD
B. Momentum
C. Adam
D. RMSProp
Rationale: Adam uses first and second moments to adjust learning rates
individually.
9. Momentum in gradient descent helps:
A. Prevent overfitting
B. Reduce batch size
C. Smooth and accelerate updates
D. Normalize activations
Rationale: Momentum accumulates gradients to speed updates along consistent
directions.