Examen

QCM Exam – Multiple Choice Questions Practice with Correct Answer

Puntuación

Vendido

Páginas

Grado

A+

Subido en

17-03-2026

Escrito en

2025/2026

This document contains a QCM (Questions à Choix Multiples) exam designed to assess knowledge through structured multiple-choice questions. It includes practice questions covering key concepts, with a focus on accuracy, critical thinking, and exam-style problem solving commonly used in academic and professional assessments.

Mostrar más Leer menos

Institución

QCM

Grado

QCM

Vista previa del contenido

QCM Exam – Multiple Choice Questions Practice with Correct Answer

Question 1: In a deep neural network, which of the following best describes the primary cause of the
vanishing gradient problem?

A) The use of ReLU activation functions causing dead neurons B) Gradients becoming exponentially small
as they propagate backward through many layers with sigmoid/tanh activations C) The learning rate
being set too high, causing oscillations around the optimum D) Overfitting due to excessive model
capacity relative to training data

Correct Answer: B

Explanation:

B is correct because: The vanishing gradient problem occurs primarily when using activation functions
like sigmoid or tanh, whose derivatives are bounded between 0 and 0.25 (sigmoid) or -1 and 1 (tanh).
During backpropagation, these small derivatives are multiplied together across many layers, causing
gradients to shrink exponentially. For a network with n layers, gradients can diminish by a factor of
approximately (0.25)^n, making early layers learn extremely slowly or not at all.

A is incorrect because: ReLU activation functions actually help mitigate vanishing gradients, not cause
them. The "dead neuron" problem with ReLU is a separate issue where neurons can become
permanently inactive if they consistently receive negative inputs, but this is distinct from vanishing
gradients.

C is incorrect because: High learning rates cause divergence or oscillation during optimization, but this is
unrelated to the mathematical mechanism of vanishing gradients, which concerns the magnitude of
computed gradients, not how they're applied during parameter updates.

D is incorrect because: Overfitting relates to generalization performance on unseen data, not to the
propagation of gradients during training. A model can overfit while still having healthy gradient flow, or
suffer from vanishing gradients while underfitting.

Question 2: A convolutional neural network uses 64 filters of size 3×3×3 (where the last dimension
represents input channels) applied to an input feature map of dimensions 32×32×3 with stride 1 and
padding 'same'. What is the output volume dimension?

A) 30×30×64 B) 32×32×64 C) 32×32×3 D) 30×30×3

Correct Answer: B

Explanation:

B is correct because: With "same" padding, the spatial dimensions are preserved. The formula for
output spatial dimension with stride s=1, padding p calculated to maintain size, and kernel size k=3 is:
output = (input - k + 2p)/s + 1. For 32×32 input with 3×3 kernel and stride 1, padding of 1 pixel on each

, side gives (32 - 3 + 2)/1 + 1 = 32. The depth equals the number of filters (64), not the input channels.
Thus: 32×32×64.

A is incorrect because: 30×30 would be the result of "valid" padding (no padding), calculated as (32 -
3)/1 + 1 = 30. However, the question specifies "same" padding, which preserves dimensions.

C is incorrect because: This maintains the spatial dimensions correctly but incorrectly preserves the
input depth (3 channels) rather than using the number of filters (64) as the output depth. Each filter
produces one output channel.

D is incorrect because: This combines both errors—using "valid" padding spatial dimensions (30×30)
while also incorrectly maintaining input channel depth (3) instead of filter count (64).

Question 3: In the Transformer architecture, what is the primary mathematical purpose of the scaling
factor dk in the scaled dot-product attention mechanism Attention(Q,K,V)=softmax(dkQKT)V ?

A) To normalize the attention weights so they sum to 1 B) To prevent the dot products from growing too
large in magnitude, which would push the softmax function into regions with extremely small gradients
C) To ensure that the query and key matrices are orthogonal D) To convert the attention scores into
probability distributions

Correct Answer: B

Explanation:

B is correct because: When dk (dimension of keys/queries) is large, the dot products QKT grow in
magnitude because the sum involves more terms. For random vectors with mean 0 and variance 1, the
dot product variance is dk . Large dot product values push the softmax function into regions where it
saturates (near 0 or 1), producing extremely small gradients that hinder learning. Dividing by dk
normalizes the variance to approximately 1, maintaining stable gradients.

A is incorrect because: The softmax function itself ensures outputs sum to 1 through its normalization
(dividing by the sum of exponentials). The scaling factor is applied before the softmax, so it doesn't
serve this normalization purpose.

C is incorrect because: The scaling factor doesn't enforce or encourage orthogonality between Q and K
matrices. Orthogonality would require specific constraints on the weight matrices during training, not a
simple scaling of dot products.

D is incorrect because: The conversion to probability distributions is accomplished by the softmax
function's exponential and normalization operations, not by the scaling factor. The scaling occurs before
this conversion and serves a different purpose.

Question 4: Which regularization technique explicitly constrains the L2 norm of the incoming weight
vector for each neuron to be exactly equal to a fixed constant (typically 1)?

A) L2 regularization (weight decay) B) Dropout C) Batch Normalization D) Weight Normalization

Informar violación de derechos de autor

Escuela, estudio y materia

Institución: QCM
Grado: QCM

Información del documento

Subido en: 17 de marzo de 2026
Número de páginas: 13
Escrito en: 2025/2026
Tipo: Examen
Contiene: Preguntas y respuestas

Temas

$10.49

Accede al documento completo:

Escrito por estudiantes que aprobaron

Inmediatamente disponible después del pago

Leer en línea o como PDF

Conoce al vendedor

agneswangu1

Conoce al vendedor

agneswangu1 stuvia

Ver perfil

Seguir

Vendido

Miembro desde

2 meses

Número de seguidores

Documentos

Última venta

0.0

0 reseñas

Documentos populares

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

100% de satisfacción garantizada: ¿Cómo funciona?

Nuestra garantía de satisfacción le asegura que siempre encontrará un documento de estudio a tu medida. Tu rellenas un formulario y nuestro equipo de atención al cliente se encarga del resto.

Who am I buying this summary from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller agneswangu1. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy this summary for $10.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 45,681 summaries were sold in the last 30 days Founded in 2010, the go-to place to buy summaries for 16 years now