Examen

CS7643 QUIZ 4: RECURRENT NETWORKS, EMBEDDINGS & SEQUENCE MODELING

Puntuación

Vendido

Páginas

Grado

A+

Subido en

21-06-2026

Escrito en

2025/2026

This document contains study material and practice questions for CS7643 Quiz 4, focusing on recurrent neural networks, embeddings, and sequence modeling techniques in deep learning. Topics include recurrent network architectures, sequence processing, word embeddings, language modeling, long short-term memory (LSTM) networks, gated recurrent units (GRUs), sequence-to-sequence models, attention mechanisms, training challenges, and practical applications in natural language processing and time-series analysis. It is designed to help students prepare for quizzes and strengthen their understanding of sequence-based machine learning models.

Mostrar más Leer menos

Institución

CS7643

Grado

CS7643

Vista previa del contenido

CS7643 QUIZ 4: RECURRENT NETWORKS,
EMBEDDINGS & SEQUENCE MODELING
SECTION A: RECURRENT NEURAL NETWORKS (10 Questions)

Q1: In a vanilla RNN with update rule h(t) = tanh(U·x(t) + V·h(t-1) + b), what is the
primary computational disadvantage during training?

A. The model requires O(T²) memory to store all intermediate hidden states.
B. The forward pass cannot be parallelized across time steps due to sequential
dependency. [CORRECT]

C. The backward pass can be fully parallelized using modern GPU architectures.
D. The number of parameters scales linearly with sequence length T.

Correct Answer: B

Rationale: Correct because the hidden state h(t) depends on h(t-1), forcing
sequential computation with runtime O(T) that cannot be parallelized across the
time dimension.
Q2: A vanilla RNN is trained on sequences of length T=100. Analysis shows that
gradients with respect to early time step inputs are approximately zero. What is
the most likely cause?

A. The learning rate is too high, causing gradient descent to oscillate.

B. The weight matrix V has spectral radius less than 1, causing vanishing
gradients. [CORRECT]
C. The activation function is ReLU rather than tanh.
D. The input dimension is larger than the hidden dimension.

Correct Answer: B

Rationale: Correct because the Jacobian ∂h(t)/∂h(t-1) involves repeated
multiplication by V; when the spectral radius of V is less than 1, gradients decay
exponentially as V^t, producing vanishing gradients for early time steps.
Q3: Which RNN architecture is most appropriate for sentiment classification,
where a single sentiment label must be produced for an input sentence of
variable length?
A. N-to-N architecture with one output per word.

,B. N-to-1 architecture that maps the final hidden state to a single output.
[CORRECT]

C. 1-to-N architecture that generates a sequence from a single input vector.

D. Encoder-decoder with attention over all intermediate states.

Correct Answer: B

Rationale: Correct because sentiment classification requires mapping a variable-
length input sequence to a single output label, which is precisely the N-to-1
architecture where the final hidden state encodes the entire sequence.

Q4: During training of a vanilla RNN, gradient norms suddenly spike to values
exceeding 1000. Which technique should be applied?
A. Reduce the learning rate by a factor of 10.

B. Apply gradient clipping to bound the maximum gradient norm. [CORRECT]

C. Switch from SGD to Adam optimizer immediately.

D. Increase the hidden state dimension to absorb larger gradients.
Correct Answer: B

Rationale: Correct because exploding gradients occur when the spectral radius of
recurrent weights exceeds 1; gradient clipping directly bounds the gradient norm
during backpropagation through time without modifying the architecture.

Q5: In teacher forcing during RNN training, what input is fed at time step t+1?
A. The model's own predicted output from time step t.

B. The ground-truth target value from the training data at time step t+1.
[CORRECT]

C. A weighted average of the prediction and ground truth.

D. The hidden state from time step t passed through the output layer.

Correct Answer: B

Rationale: Correct because teacher forcing uses the actual training data value as
the next input rather than the model's prediction, which emerges from maximum
likelihood estimation and prevents error accumulation during training.

Q6: A researcher replaces hidden-to-hidden recurrence with teacher forcing at
every time step during both training and inference. What is the primary
consequence?

, A. The model becomes unable to handle variable-length sequences.

B. The model can be parallelized across time steps but loses the ability to
propagate information through hidden states. [CORRECT]

C. The vanishing gradient problem is completely eliminated.

D. The model requires twice as many parameters as a standard RNN.

Correct Answer: B

Rationale: Correct because removing hidden-to-hidden recurrence eliminates the
sequential dependency chain, enabling parallelization, but the model loses the
recurrent path for propagating information across time steps, making it less
powerful than a true RNN.
Q7: Truncated backpropagation through time (BPTT) with truncation parameter
k=10 on sequences of length T=100 means:

A. Only the first 10 time steps are used in the forward pass.

B. Gradients are backpropagated through at most 10 time steps before
truncation. [CORRECT]

C. The hidden state is reset to zero every 10 time steps.

D. The model processes the sequence in 10 non-overlapping chunks.

Correct Answer: B

Rationale: Correct because truncated BPTT limits the temporal span of gradient
computation to k steps, approximating full BPTT while controlling computational
cost and mitigating vanishing/exploding gradients in long sequences.
Q8: Which of the following is NOT a valid criticism of using MLPs for NLP tasks
compared to RNNs?

A. MLPs cannot easily support variable-sized input sequences.

B. MLPs have no inherent mechanism for modeling temporal structure.

C. MLPs require network size to grow with maximum allowed sequence length.
D. MLPs suffer from vanishing gradients across time steps. [CORRECT]

Correct Answer: D

Rationale: Correct because vanishing gradients across time steps is a problem
specific to recurrent architectures with repeated weight multiplication; MLPs

Informar violación de derechos de autor

Escuela, estudio y materia

Institución: CS7643
Grado: CS7643

Información del documento

Subido en: 21 de junio de 2026
Número de páginas: 19
Escrito en: 2025/2026
Tipo: Examen
Contiene: Preguntas y respuestas

Temas

recurrent neural networks
word embeddings
sequence modeling
language modeling
lstm
gru
attention mechanisms
deep learning
sequence to sequence models
natural language processing

$15.99

Accede al documento completo:

Escrito por estudiantes que aprobaron

Inmediatamente disponible después del pago

Leer en línea o como PDF

Conoce al vendedor

ExamAceStuvia

3.9

(7)

Conoce al vendedor

ExamAceStuvia Rasmussen College

Ver perfil

Seguir

Vendido

Miembro desde

10 meses

Número de seguidores

Documentos

963

Última venta

1 semana hace

Top Grades By ExamAceStuvia

Ace Your Certification — The Smart Way! Welcome to ExamAceStuvia – the ultimate battle-tested exam prep platform built by passers, for future passers. Get thousands of real exam questions straight from people who just crushed the same test you’re facing. No fluff. No outdated dumps. Just authentic, up-to-date practice that feels exactly like the real thing. Why thousands choose Examice every day: 400+ published exams across 100+ top providers (AWS, Microsoft, Cisco, ,NCLEX , WGU , CompTIA, and many more) Whether you're preparing for nursing licensure (NCLEX, ATI, HESI, ANCC, AANP), healthcare certifications (ACLS, BLS, PALS, PMHNP, AGNP), standardized tests (TEAS, HESI, PAX, NLN), or university-specific exams (WGU, Portage Learning, Georgia Tech, and more), our documents are 100% correct, up-to-date for 2025/2026, and reviewed for accuracy.. Community-powered accuracy → open discussions, source-backed references, democratic voting & follow-up Q&A to lock in the real correct answers Realistic exam that builds confidence and exposes weak spots fast Most affordable premium prep in the industry – quality without breaking the bank Regular updates so you’re always studying what actually appears today Whether you're chasing that dream job, promotion, or career switch — ExamAce turns “I hope I pass” into “I’ve got this.” Join the community that’s already helped thousands certify. Try ExamAceStuvia today → pass tomorrow.

Lee mas Leer menos

3.9

7 reseñas

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

100% de satisfacción garantizada: ¿Cómo funciona?

Nuestra garantía de satisfacción le asegura que siempre encontrará un documento de estudio a tu medida. Tu rellenas un formulario y nuestro equipo de atención al cliente se encarga del resto.

Who am I buying this summary from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller ExamAceStuvia. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy this summary for $15.99. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 45,681 summaries were sold in the last 30 days Founded in 2010, the go-to place to buy summaries for 16 years now