Escrito por estudiantes que aprobaron Inmediatamente disponible después del pago Leer en línea o como PDF ¿Documento equivocado? Cámbialo gratis 4,6 TrustPilot
logo-home
Examen

CS7643 QUIZ 4: RECURRENT NETWORKS, EMBEDDINGS & SEQUENCE MODELING

Puntuación
-
Vendido
-
Páginas
19
Grado
A+
Subido en
21-06-2026
Escrito en
2025/2026

This document contains study material and practice questions for CS7643 Quiz 4, focusing on recurrent neural networks, embeddings, and sequence modeling techniques in deep learning. Topics include recurrent network architectures, sequence processing, word embeddings, language modeling, long short-term memory (LSTM) networks, gated recurrent units (GRUs), sequence-to-sequence models, attention mechanisms, training challenges, and practical applications in natural language processing and time-series analysis. It is designed to help students prepare for quizzes and strengthen their understanding of sequence-based machine learning models.

Mostrar más Leer menos
Institución
CS7643
Grado
CS7643

Vista previa del contenido

CS7643 QUIZ 4: RECURRENT NETWORKS,
EMBEDDINGS & SEQUENCE MODELING
SECTION A: RECURRENT NEURAL NETWORKS (10 Questions)

Q1: In a vanilla RNN with update rule h(t) = tanh(U·x(t) + V·h(t-1) + b), what is the
primary computational disadvantage during training?

A. The model requires O(T²) memory to store all intermediate hidden states.
B. The forward pass cannot be parallelized across time steps due to sequential
dependency. [CORRECT]

C. The backward pass can be fully parallelized using modern GPU architectures.
D. The number of parameters scales linearly with sequence length T.

Correct Answer: B

Rationale: Correct because the hidden state h(t) depends on h(t-1), forcing
sequential computation with runtime O(T) that cannot be parallelized across the
time dimension.
Q2: A vanilla RNN is trained on sequences of length T=100. Analysis shows that
gradients with respect to early time step inputs are approximately zero. What is
the most likely cause?

A. The learning rate is too high, causing gradient descent to oscillate.

B. The weight matrix V has spectral radius less than 1, causing vanishing
gradients. [CORRECT]
C. The activation function is ReLU rather than tanh.
D. The input dimension is larger than the hidden dimension.

Correct Answer: B

Rationale: Correct because the Jacobian ∂h(t)/∂h(t-1) involves repeated
multiplication by V; when the spectral radius of V is less than 1, gradients decay
exponentially as V^t, producing vanishing gradients for early time steps.
Q3: Which RNN architecture is most appropriate for sentiment classification,
where a single sentiment label must be produced for an input sentence of
variable length?
A. N-to-N architecture with one output per word.

,B. N-to-1 architecture that maps the final hidden state to a single output.
[CORRECT]

C. 1-to-N architecture that generates a sequence from a single input vector.

D. Encoder-decoder with attention over all intermediate states.

Correct Answer: B

Rationale: Correct because sentiment classification requires mapping a variable-
length input sequence to a single output label, which is precisely the N-to-1
architecture where the final hidden state encodes the entire sequence.

Q4: During training of a vanilla RNN, gradient norms suddenly spike to values
exceeding 1000. Which technique should be applied?
A. Reduce the learning rate by a factor of 10.

B. Apply gradient clipping to bound the maximum gradient norm. [CORRECT]

C. Switch from SGD to Adam optimizer immediately.

D. Increase the hidden state dimension to absorb larger gradients.
Correct Answer: B

Rationale: Correct because exploding gradients occur when the spectral radius of
recurrent weights exceeds 1; gradient clipping directly bounds the gradient norm
during backpropagation through time without modifying the architecture.

Q5: In teacher forcing during RNN training, what input is fed at time step t+1?
A. The model's own predicted output from time step t.

B. The ground-truth target value from the training data at time step t+1.
[CORRECT]

C. A weighted average of the prediction and ground truth.

D. The hidden state from time step t passed through the output layer.

Correct Answer: B

Rationale: Correct because teacher forcing uses the actual training data value as
the next input rather than the model's prediction, which emerges from maximum
likelihood estimation and prevents error accumulation during training.

Q6: A researcher replaces hidden-to-hidden recurrence with teacher forcing at
every time step during both training and inference. What is the primary
consequence?

, A. The model becomes unable to handle variable-length sequences.

B. The model can be parallelized across time steps but loses the ability to
propagate information through hidden states. [CORRECT]

C. The vanishing gradient problem is completely eliminated.

D. The model requires twice as many parameters as a standard RNN.

Correct Answer: B

Rationale: Correct because removing hidden-to-hidden recurrence eliminates the
sequential dependency chain, enabling parallelization, but the model loses the
recurrent path for propagating information across time steps, making it less
powerful than a true RNN.
Q7: Truncated backpropagation through time (BPTT) with truncation parameter
k=10 on sequences of length T=100 means:

A. Only the first 10 time steps are used in the forward pass.

B. Gradients are backpropagated through at most 10 time steps before
truncation. [CORRECT]

C. The hidden state is reset to zero every 10 time steps.

D. The model processes the sequence in 10 non-overlapping chunks.

Correct Answer: B

Rationale: Correct because truncated BPTT limits the temporal span of gradient
computation to k steps, approximating full BPTT while controlling computational
cost and mitigating vanishing/exploding gradients in long sequences.
Q8: Which of the following is NOT a valid criticism of using MLPs for NLP tasks
compared to RNNs?

A. MLPs cannot easily support variable-sized input sequences.

B. MLPs have no inherent mechanism for modeling temporal structure.

C. MLPs require network size to grow with maximum allowed sequence length.
D. MLPs suffer from vanishing gradients across time steps. [CORRECT]

Correct Answer: D

Rationale: Correct because vanishing gradients across time steps is a problem
specific to recurrent architectures with repeated weight multiplication; MLPs

Escuela, estudio y materia

Institución
CS7643
Grado
CS7643

Información del documento

Subido en
21 de junio de 2026
Número de páginas
19
Escrito en
2025/2026
Tipo
Examen
Contiene
Preguntas y respuestas

Temas

$15.99
Accede al documento completo:

¿Documento equivocado? Cámbialo gratis Dentro de los 14 días posteriores a la compra y antes de descargarlo, puedes elegir otro documento. Puedes gastar el importe de nuevo.
Escrito por estudiantes que aprobaron
Inmediatamente disponible después del pago
Leer en línea o como PDF

Conoce al vendedor

Seller avatar
Los indicadores de reputación están sujetos a la cantidad de artículos vendidos por una tarifa y las reseñas que ha recibido por esos documentos. Hay tres niveles: Bronce, Plata y Oro. Cuanto mayor reputación, más podrás confiar en la calidad del trabajo del vendedor.
ExamAceStuvia Rasmussen College
Seguir Necesitas iniciar sesión para seguir a otros usuarios o asignaturas
Vendido
38
Miembro desde
10 meses
Número de seguidores
0
Documentos
963
Última venta
1 semana hace
Top Grades By ExamAceStuvia

Ace Your Certification — The Smart Way! Welcome to ExamAceStuvia – the ultimate battle-tested exam prep platform built by passers, for future passers. Get thousands of real exam questions straight from people who just crushed the same test you’re facing. No fluff. No outdated dumps. Just authentic, up-to-date practice that feels exactly like the real thing. Why thousands choose Examice every day: 400+ published exams across 100+ top providers (AWS, Microsoft, Cisco, ,NCLEX , WGU , CompTIA, and many more) Whether you're preparing for nursing licensure (NCLEX, ATI, HESI, ANCC, AANP), healthcare certifications (ACLS, BLS, PALS, PMHNP, AGNP), standardized tests (TEAS, HESI, PAX, NLN), or university-specific exams (WGU, Portage Learning, Georgia Tech, and more), our documents are 100% correct, up-to-date for 2025/2026, and reviewed for accuracy.. Community-powered accuracy → open discussions, source-backed references, democratic voting & follow-up Q&A to lock in the real correct answers Realistic exam that builds confidence and exposes weak spots fast Most affordable premium prep in the industry – quality without breaking the bank Regular updates so you’re always studying what actually appears today Whether you're chasing that dream job, promotion, or career switch — ExamAce turns “I hope I pass” into “I’ve got this.” Join the community that’s already helped thousands certify. Try ExamAceStuvia today → pass tomorrow.

Lee mas Leer menos
3.9

7 reseñas

5
4
4
0
3
2
2
0
1
1

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

Student with book image

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes