Examen

Last Quiz: CS 7643 Deep Learning | Questions with Verified Answers | 100% Correct| Latest 2025/2026 Update - Georgia Institute of Technology.

Puntuación

Vendido

Páginas

Grado

A+

Subido en

27-03-2025

Escrito en

2024/2025

Policy Iteration Policy Evaluation: Compute V(pi) Policy Refinement: Greedily change action as per V(Pi) at next states Why do Policy Iteration: PI_i often converges to PI* sooner than V_PI to V_PI* - thus requires few iterations Deep Q-Learning - Q(s, a; w, b) = w_a^t * s + b_a MSE Loss := (Q_new(s, a) - (r + y*max_a(Q_old(s', a)))^2 - using a single Q function makes loss function unstable --> use two Q-tables (NNs) - Freeze Q_old and update Q_new - Set Q_old = Q_new at regular intervals Reinforcement learning Sequential decision making in an environment with evaluative feedback Environment: may be unknown, non-linear, stochastic and complex Agent: learns a policy to map states of the environments to actions - seeks to maximize long-term reward RL: Evaluative Feedback - Pick an action, receive a reward - No supervision for what the correct action is or would have been (unlike supervised learning) RL: Sequential Decisions - Plan and execution actions over a sequence of states - Reward may be delayed, requiring optimization of future rewards (long-term planning) Signature Challenges in RL Evaluative Feedback: Need trial and error to find the right action Last Quiz: CS 7643 Deep Learning | Questions with Verified Answers | 100% Correct| Latest 2025/2026 Update - Georgia Institute of Technology.

Mostrar más Leer menos

Institución

CS 7643 Deep Learning

Grado

CS 7643 Deep Learning

Ups! No podemos cargar tu documento ahora. Inténtalo de nuevo o contacta con soporte.

Informar violación de derechos de autor

Escuela, estudio y materia

Institución: CS 7643 Deep Learning
Grado: CS 7643 Deep Learning

Información del documento

Subido en: 27 de marzo de 2025
Número de páginas: 16
Escrito en: 2024/2025
Tipo: Examen
Contiene: Preguntas y respuestas

Temas

last quiz cs 7643
last quiz cs 7643 deep learning questions with
last quiz cs 7643 deep learning
cs 7643 deep learning questions with

Vista previa del contenido

Last Quiz: CS 7643 Deep Learning | Questions
i,- i,- i,- i,- i,- i,- i,- i,-

with Verified Answers | 100% Correct| Latest
i,- i,- i,- i,- i,- i,- i,-

2025/2026 Update - Georgia Institute of i,- i,- i,- i,- i,- i,-

Technology.

Policy Iteration i,- i,-i,- i,- Policy Evaluation: Compute V(pi) i,- i,- i,-

Policy Refinement: Greedily change action as per V(Pi) at next
i,- i,- i,- i,- i,- i,- i,- i,- i,- i,-

states

Why do Policy Iteration: PI_i often converges to PI* sooner than
i,- i,- i,- i,- i,- i,- i,- i,- i,- i,- i,-

V_PI to V_PI*
i,- i,-

- thus requires few iterations
i,- i,- i,- i,-

Deep Q-Learning i,- i,-i,- i,- - Q(s, a; w, b) = w_a^t * s + b_a
i,- i,- i,- i,- i,- i,- i,- i,- i,- i,-

MSE Loss := (Q_new(s, a) - (r + y*max_a(Q_old(s', a)))^2
i,- i,- i,- i,- i,- i,- i,- i,- i,-

- using a single Q function makes loss function unstable
i,- i,- i,- i,- i,- i,- i,- i,- i,-

--> use two Q-tables (NNs)
i,- i,- i,- i,-

- Freeze Q_old and update Q_new
i,- i,- i,- i,- i,-

- Set Q_old = Q_new at regular intervals
i,- i,- i,- i,- i,- i,- i,-

,Reinforcement learning Sequential decision making in an i,- i,-i,- i,- i,- i,- i,- i,- i,-

environment with evaluative feedback i,- i,- i,-

Environment: may be unknown, non-linear, stochastic and i,- i,- i,- i,- i,- i,- i,-

complex
Agent: learns a policy to map states of the environments to
i,- i,- i,- i,- i,- i,- i,- i,- i,- i,- i,-

actions
- seeks to maximize long-term reward
i,- i,- i,- i,- i,-

RL: Evaluative Feedback
i,- i,- i,-i,- i,- - Pick an action, receive a reward
i,- i,- i,- i,- i,- i,-

- No supervision for what the correct action is or would have
i,- i,- i,- i,- i,- i,- i,- i,- i,- i,- i,- i,-

been (unlike supervised learning)
i,- i,- i,-

RL: Sequential Decisions
i,- i,- i,-i,- i,- - Plan and execution actions over a
i,- i,- i,- i,- i,- i,- i,-

sequence of states i,- i,-

- Reward may be delayed, requiring optimization of future
i,- i,- i,- i,- i,- i,- i,- i,- i,-

rewards (long-term planning) i,- i,-

Signature Challenges in RL Evaluative Feedback: Need trial
i,- i,- i,- i,-i,- i,- i,- i,- i,- i,-

and error to find the right action
i,- i,- i,- i,- i,- i,-

Delayed Feedback: Actions may not lead to immediate reward
i,- i,- i,- i,- i,- i,- i,- i,-

, Non-stationarity: Data distribution of visited states changes when i,- i,- i,- i,- i,- i,- i,- i,-

the policy changes
i,- i,-

Fleeting Nature: of online data (may only see data once)
i,- i,- i,- i,- i,- i,- i,- i,- i,-

MDP i,-i,- i,- Framework underlying RL i,- i,-

S: Set of states
i,- i,- i,-

A: Set of actions
i,- i,- i,-

R: Distribution of Rewards
i,- i,- i,-

T: Transition probabiliity
i,- i,-

y: Discount property
i,- i,-

Markov Property: Current state completely characterizes state of
i,- i,- i,- i,- i,- i,- i,- i,-

the environment
i,-

RL: Equations relating optimal quantities
i,- i,- i,- i,- i,-i,- i,- 1. V*(S) =
i,- i,- i,-

max_a(Q*(s, a) i,-

2. PI*(s) = argmax_a(Q*(s, a)
i,- i,- i,- i,-

V*(S) i,-i,- i,- max_a (sum_(s') { p(s'|s, a) [r(s, a) + yV*(s')] } )
i,- i,- i,- i,- i,- i,- i,- i,- i,- i,-

$10.49

Accede al documento completo:

100% de satisfacción garantizada

Inmediatamente disponible después del pago

Tanto en línea como en PDF

No estas atado a nada

Conoce al vendedor

AcademiaExpert

3.8

(433)

Documento también disponible en un lote

Conoce al vendedor

AcademiaExpert Chamberlain College Of Nursing

Ver perfil

Seguir

Vendido

1708

Miembro desde

5 año

Número de seguidores

762

Documentos

4001

Última venta

1 día hace

EXAMS, STUDY GUIDES, ESSAYS, NOTES & GOOD GRADES

Hello, my name is Archie. I am an experienced tutor and I am here to provide you with all your study solutions ranging from exams, study guides, essays, notes and just to make school a little bit easier for you. Engage me if you have any questions about your course and I will swiftly and gladly assist. Good luck with studying and all the best going forward.

3.8

433 reseñas

200

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

100% de satisfacción garantizada: ¿Cómo funciona?

Nuestra garantía de satisfacción le asegura que siempre encontrará un documento de estudio a tu medida. Tu rellenas un formulario y nuestro equipo de atención al cliente se encarga del resto.

Who am I buying this summary from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller AcademiaExpert. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy this summary for $10.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 45,681 summaries were sold in the last 30 days Founded in 2010, the go-to place to buy summaries for 16 years now