Escrito por estudiantes que aprobaron Inmediatamente disponible después del pago Leer en línea o como PDF ¿Documento equivocado? Cámbialo gratis 4,6 TrustPilot
logo-home
Examen

QCM Exam – Multiple Choice Questions Practice with Correct Answer

Puntuación
-
Vendido
-
Páginas
13
Grado
A+
Subido en
17-03-2026
Escrito en
2025/2026

This document contains a QCM (Questions à Choix Multiples) exam designed to assess knowledge through structured multiple-choice questions. It includes practice questions covering key concepts, with a focus on accuracy, critical thinking, and exam-style problem solving commonly used in academic and professional assessments.

Mostrar más Leer menos
Institución
QCM
Grado
QCM

Vista previa del contenido

QCM Exam – Multiple Choice Questions Practice with Correct Answer

Question 1: In a deep neural network, which of the following best describes the primary cause of the
vanishing gradient problem?

A) The use of ReLU activation functions causing dead neurons B) Gradients becoming exponentially small
as they propagate backward through many layers with sigmoid/tanh activations C) The learning rate
being set too high, causing oscillations around the optimum D) Overfitting due to excessive model
capacity relative to training data

Correct Answer: B

Explanation:

B is correct because: The vanishing gradient problem occurs primarily when using activation functions
like sigmoid or tanh, whose derivatives are bounded between 0 and 0.25 (sigmoid) or -1 and 1 (tanh).
During backpropagation, these small derivatives are multiplied together across many layers, causing
gradients to shrink exponentially. For a network with n layers, gradients can diminish by a factor of
approximately (0.25)^n, making early layers learn extremely slowly or not at all.

A is incorrect because: ReLU activation functions actually help mitigate vanishing gradients, not cause
them. The "dead neuron" problem with ReLU is a separate issue where neurons can become
permanently inactive if they consistently receive negative inputs, but this is distinct from vanishing
gradients.

C is incorrect because: High learning rates cause divergence or oscillation during optimization, but this is
unrelated to the mathematical mechanism of vanishing gradients, which concerns the magnitude of
computed gradients, not how they're applied during parameter updates.

D is incorrect because: Overfitting relates to generalization performance on unseen data, not to the
propagation of gradients during training. A model can overfit while still having healthy gradient flow, or
suffer from vanishing gradients while underfitting.



Question 2: A convolutional neural network uses 64 filters of size 3×3×3 (where the last dimension
represents input channels) applied to an input feature map of dimensions 32×32×3 with stride 1 and
padding 'same'. What is the output volume dimension?

A) 30×30×64 B) 32×32×64 C) 32×32×3 D) 30×30×3

Correct Answer: B

Explanation:

B is correct because: With "same" padding, the spatial dimensions are preserved. The formula for
output spatial dimension with stride s=1, padding p calculated to maintain size, and kernel size k=3 is:
output = (input - k + 2p)/s + 1. For 32×32 input with 3×3 kernel and stride 1, padding of 1 pixel on each

, side gives (32 - 3 + 2)/1 + 1 = 32. The depth equals the number of filters (64), not the input channels.
Thus: 32×32×64.

A is incorrect because: 30×30 would be the result of "valid" padding (no padding), calculated as (32 -
3)/1 + 1 = 30. However, the question specifies "same" padding, which preserves dimensions.

C is incorrect because: This maintains the spatial dimensions correctly but incorrectly preserves the
input depth (3 channels) rather than using the number of filters (64) as the output depth. Each filter
produces one output channel.

D is incorrect because: This combines both errors—using "valid" padding spatial dimensions (30×30)
while also incorrectly maintaining input channel depth (3) instead of filter count (64).



Question 3: In the Transformer architecture, what is the primary mathematical purpose of the scaling
factor dk in the scaled dot-product attention mechanism Attention(Q,K,V)=softmax(dkQKT)V ?

A) To normalize the attention weights so they sum to 1 B) To prevent the dot products from growing too
large in magnitude, which would push the softmax function into regions with extremely small gradients
C) To ensure that the query and key matrices are orthogonal D) To convert the attention scores into
probability distributions

Correct Answer: B

Explanation:

B is correct because: When dk (dimension of keys/queries) is large, the dot products QKT grow in
magnitude because the sum involves more terms. For random vectors with mean 0 and variance 1, the
dot product variance is dk . Large dot product values push the softmax function into regions where it
saturates (near 0 or 1), producing extremely small gradients that hinder learning. Dividing by dk
normalizes the variance to approximately 1, maintaining stable gradients.

A is incorrect because: The softmax function itself ensures outputs sum to 1 through its normalization
(dividing by the sum of exponentials). The scaling factor is applied before the softmax, so it doesn't
serve this normalization purpose.

C is incorrect because: The scaling factor doesn't enforce or encourage orthogonality between Q and K
matrices. Orthogonality would require specific constraints on the weight matrices during training, not a
simple scaling of dot products.

D is incorrect because: The conversion to probability distributions is accomplished by the softmax
function's exponential and normalization operations, not by the scaling factor. The scaling occurs before
this conversion and serves a different purpose.



Question 4: Which regularization technique explicitly constrains the L2 norm of the incoming weight
vector for each neuron to be exactly equal to a fixed constant (typically 1)?

A) L2 regularization (weight decay) B) Dropout C) Batch Normalization D) Weight Normalization

Escuela, estudio y materia

Institución
QCM
Grado
QCM

Información del documento

Subido en
17 de marzo de 2026
Número de páginas
13
Escrito en
2025/2026
Tipo
Examen
Contiene
Preguntas y respuestas
$10.49
Accede al documento completo:

¿Documento equivocado? Cámbialo gratis Dentro de los 14 días posteriores a la compra y antes de descargarlo, puedes elegir otro documento. Puedes gastar el importe de nuevo.
Escrito por estudiantes que aprobaron
Inmediatamente disponible después del pago
Leer en línea o como PDF

Conoce al vendedor
Seller avatar
agneswangu1

Conoce al vendedor

Seller avatar
agneswangu1 stuvia
Ver perfil
Seguir Necesitas iniciar sesión para seguir a otros usuarios o asignaturas
Vendido
-
Miembro desde
2 meses
Número de seguidores
0
Documentos
70
Última venta
-

0.0

0 reseñas

5
0
4
0
3
0
2
0
1
0

Documentos populares

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

Student with book image

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes