Examen

CS7643 QUIZ 5 QUESTIONS WITH DETAILED VERIFIED ANSWERS (100% CORRECT ANSWERS) /ALREADY GRADED A +

Puntuación

Vendido

Páginas

Grado

A+

Subido en

24-04-2026

Escrito en

2025/2026

CS7643 QUIZ 5 QUESTIONS WITH DETAILED VERIFIED ANSWERS (100% CORRECT ANSWERS) /ALREADY GRADED A +

Institución

CS7643

Grado

CS7643

Vista previa del contenido

CS7643 QUIZ 5 QUESTIONS WITH DETAILED VERIFIED
ANSWERS (100% CORRECT ANSWERS) /ALREADY
GRADED A +

Neural Attention - ANSWER-- weighting or probability distribution over inputs that depend on
computational state and inputs

-HOW IT's Computed?

1. "Hard" - where samples are drawn from the distribution over the input

2. "Soft" - where the distribution is used directly as a weighted average

- Allows information to propagate between distant computational nodes while making minimal structural
assumptions

- Most standard form of attention is softmax

Softmax Properties - ANSWER-- Probabilities sum to one (gives probability distribution independent of
input)

- Performed on sets so invariant to different permutations (permutation invariant)

- Not linear

- Doubling inputs will put more mass on largest input

- Softmax is differentiable

Softmax Attention vs Final Layer of MLP - ANSWER-Attention:

- q is an internal hidden state, U is the embeddings of input (previous layer)

- distribution corresponds to a summary of U

MLP:

- q is last hidden state, U is embedding of class labels

- distribution corresponds to labelings (outputs)

, Position Embedding - ANSWER-- A vector that depend only on the location in the sequence which is
added to an input placed at that location in the sequence.

- Adds information about the absolute and relative locations of inputs

--> Need this in transformer architectures as they are attention based not sequentially based

Transformers - ANSWER-- multi-layer attention model that is state of the art in most language tasks

- Superior compared to previous attention architectures because:

1. Multi-query hidden-state propagation ("Self-attention") (MOST IMPORTANT THING)

2. Multi-head attention

3. Residual Connections, LayerNorm

Transformers: Self Attention (Multi-query hidden-state propagation) - ANSWER-- improves on softmax
attention by having a controller for every input (size of controller state grows with input size)

Transformers: Multi-head attention - ANSWER-- combines multiple attention 'heads' being trained in the
same way on the same data - but with different weight matrices

- each of the L attention heads yields values for each token; these values are then multiplied by trained
parameters and added

Causal Attention - ANSWER-- Attention mask (way of putting a graph structure on transformer)

- Masks out attention weights that don't go from left to right

--> training code outputs a prediction at each token simultaneously (and takes gradients simultaneously)

--> massively speeds up training (by the size of the context)

--> Not necessary for masked language models like BERT

Attention vs. Seq2Seq Modeling - ANSWER-- Seq2Seq passes a single context (the last hidden state) to
the decoder, Attention passes all hidden states to the decoder

- Decoder computes a weighted sum of all hidden states to determine single context vector

BERT is a stack of - ANSWER-Encoder Modules

Informar violación de derechos de autor

Escuela, estudio y materia

Institución: CS7643
Grado: CS7643

Información del documento

Subido en: 24 de abril de 2026
Número de páginas: 5
Escrito en: 2025/2026
Tipo: Examen
Contiene: Preguntas y respuestas

Temas

cs7643
cs7643 quiz 5
alreadyy graded a
quiz 5 questions with detailed verified answers
cs7643 quiz 5 questions with detailed verified ans

$19.99

Accede al documento completo:

Escrito por estudiantes que aprobaron

Inmediatamente disponible después del pago

Leer en línea o como PDF

Conoce al vendedor

DoctorDee

3.5

(6)

Conoce al vendedor

DoctorDee Teachme2-tutor

Ver perfil

Seguir

Vendido

Miembro desde

2 año

Número de seguidores

Documentos

4934

Última venta

2 días hace

Hi wayne1111

3.5

6 reseñas

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

100% de satisfacción garantizada: ¿Cómo funciona?

Nuestra garantía de satisfacción le asegura que siempre encontrará un documento de estudio a tu medida. Tu rellenas un formulario y nuestro equipo de atención al cliente se encarga del resto.

Who am I buying this summary from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller DoctorDee. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy this summary for $19.99. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 45,681 summaries were sold in the last 30 days Founded in 2010, the go-to place to buy summaries for 16 years now