Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Exam (elaborations)

QCM Exam – Multiple Choice Questions Practice with Correct Answer

Rating
-
Sold
-
Pages
13
Grade
A+
Uploaded on
17-03-2026
Written in
2025/2026

This document contains a QCM (Questions à Choix Multiples) exam designed to assess knowledge through structured multiple-choice questions. It includes practice questions covering key concepts, with a focus on accuracy, critical thinking, and exam-style problem solving commonly used in academic and professional assessments.

Show more Read less
Institution
QCM
Course
QCM

Content preview

QCM Exam – Multiple Choice Questions Practice with Correct Answer

Question 1: In a deep neural network, which of the following best describes the primary cause of the
vanishing gradient problem?

A) The use of ReLU activation functions causing dead neurons B) Gradients becoming exponentially small
as they propagate backward through many layers with sigmoid/tanh activations C) The learning rate
being set too high, causing oscillations around the optimum D) Overfitting due to excessive model
capacity relative to training data

Correct Answer: B

Explanation:

B is correct because: The vanishing gradient problem occurs primarily when using activation functions
like sigmoid or tanh, whose derivatives are bounded between 0 and 0.25 (sigmoid) or -1 and 1 (tanh).
During backpropagation, these small derivatives are multiplied together across many layers, causing
gradients to shrink exponentially. For a network with n layers, gradients can diminish by a factor of
approximately (0.25)^n, making early layers learn extremely slowly or not at all.

A is incorrect because: ReLU activation functions actually help mitigate vanishing gradients, not cause
them. The "dead neuron" problem with ReLU is a separate issue where neurons can become
permanently inactive if they consistently receive negative inputs, but this is distinct from vanishing
gradients.

C is incorrect because: High learning rates cause divergence or oscillation during optimization, but this is
unrelated to the mathematical mechanism of vanishing gradients, which concerns the magnitude of
computed gradients, not how they're applied during parameter updates.

D is incorrect because: Overfitting relates to generalization performance on unseen data, not to the
propagation of gradients during training. A model can overfit while still having healthy gradient flow, or
suffer from vanishing gradients while underfitting.



Question 2: A convolutional neural network uses 64 filters of size 3×3×3 (where the last dimension
represents input channels) applied to an input feature map of dimensions 32×32×3 with stride 1 and
padding 'same'. What is the output volume dimension?

A) 30×30×64 B) 32×32×64 C) 32×32×3 D) 30×30×3

Correct Answer: B

Explanation:

B is correct because: With "same" padding, the spatial dimensions are preserved. The formula for
output spatial dimension with stride s=1, padding p calculated to maintain size, and kernel size k=3 is:
output = (input - k + 2p)/s + 1. For 32×32 input with 3×3 kernel and stride 1, padding of 1 pixel on each

, side gives (32 - 3 + 2)/1 + 1 = 32. The depth equals the number of filters (64), not the input channels.
Thus: 32×32×64.

A is incorrect because: 30×30 would be the result of "valid" padding (no padding), calculated as (32 -
3)/1 + 1 = 30. However, the question specifies "same" padding, which preserves dimensions.

C is incorrect because: This maintains the spatial dimensions correctly but incorrectly preserves the
input depth (3 channels) rather than using the number of filters (64) as the output depth. Each filter
produces one output channel.

D is incorrect because: This combines both errors—using "valid" padding spatial dimensions (30×30)
while also incorrectly maintaining input channel depth (3) instead of filter count (64).



Question 3: In the Transformer architecture, what is the primary mathematical purpose of the scaling
factor dk in the scaled dot-product attention mechanism Attention(Q,K,V)=softmax(dkQKT)V ?

A) To normalize the attention weights so they sum to 1 B) To prevent the dot products from growing too
large in magnitude, which would push the softmax function into regions with extremely small gradients
C) To ensure that the query and key matrices are orthogonal D) To convert the attention scores into
probability distributions

Correct Answer: B

Explanation:

B is correct because: When dk (dimension of keys/queries) is large, the dot products QKT grow in
magnitude because the sum involves more terms. For random vectors with mean 0 and variance 1, the
dot product variance is dk . Large dot product values push the softmax function into regions where it
saturates (near 0 or 1), producing extremely small gradients that hinder learning. Dividing by dk
normalizes the variance to approximately 1, maintaining stable gradients.

A is incorrect because: The softmax function itself ensures outputs sum to 1 through its normalization
(dividing by the sum of exponentials). The scaling factor is applied before the softmax, so it doesn't
serve this normalization purpose.

C is incorrect because: The scaling factor doesn't enforce or encourage orthogonality between Q and K
matrices. Orthogonality would require specific constraints on the weight matrices during training, not a
simple scaling of dot products.

D is incorrect because: The conversion to probability distributions is accomplished by the softmax
function's exponential and normalization operations, not by the scaling factor. The scaling occurs before
this conversion and serves a different purpose.



Question 4: Which regularization technique explicitly constrains the L2 norm of the incoming weight
vector for each neuron to be exactly equal to a fixed constant (typically 1)?

A) L2 regularization (weight decay) B) Dropout C) Batch Normalization D) Weight Normalization

Written for

Institution
QCM
Course
QCM

Document information

Uploaded on
March 17, 2026
Number of pages
13
Written in
2025/2026
Type
Exam (elaborations)
Contains
Questions & answers
$10.49
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller
Seller avatar
agneswangu1

Get to know the seller

Seller avatar
agneswangu1 stuvia
View profile
Follow You need to be logged in order to follow users or courses
Sold
-
Member since
2 months
Number of followers
0
Documents
70
Last sold
-

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Trending documents

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions