100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Exam (elaborations)

Last Quiz: CS 7643 Deep Learning | Questions with Verified Answers | 100% Correct| Latest 2025/2026 Update - Georgia Institute of Technology.

Rating
-
Sold
-
Pages
16
Grade
A+
Uploaded on
27-03-2025
Written in
2024/2025

Policy Iteration Policy Evaluation: Compute V(pi) Policy Refinement: Greedily change action as per V(Pi) at next states Why do Policy Iteration: PI_i often converges to PI* sooner than V_PI to V_PI* - thus requires few iterations Deep Q-Learning - Q(s, a; w, b) = w_a^t * s + b_a MSE Loss := (Q_new(s, a) - (r + y*max_a(Q_old(s', a)))^2 - using a single Q function makes loss function unstable --> use two Q-tables (NNs) - Freeze Q_old and update Q_new - Set Q_old = Q_new at regular intervals Reinforcement learning Sequential decision making in an environment with evaluative feedback Environment: may be unknown, non-linear, stochastic and complex Agent: learns a policy to map states of the environments to actions - seeks to maximize long-term reward RL: Evaluative Feedback - Pick an action, receive a reward - No supervision for what the correct action is or would have been (unlike supervised learning) RL: Sequential Decisions - Plan and execution actions over a sequence of states - Reward may be delayed, requiring optimization of future rewards (long-term planning) Signature Challenges in RL Evaluative Feedback: Need trial and error to find the right action Last Quiz: CS 7643 Deep Learning | Questions with Verified Answers | 100% Correct| Latest 2025/2026 Update - Georgia Institute of Technology.

Show more Read less
Institution
CS 7643 Deep Learning
Course
CS 7643 Deep Learning










Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
CS 7643 Deep Learning
Course
CS 7643 Deep Learning

Document information

Uploaded on
March 27, 2025
Number of pages
16
Written in
2024/2025
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

  • last quiz cs 7643

Content preview

Last Quiz: CS 7643 Deep Learning | Questions
i,- i,- i,- i,- i,- i,- i,- i,-




with Verified Answers | 100% Correct| Latest
i,- i,- i,- i,- i,- i,- i,-




2025/2026 Update - Georgia Institute of i,- i,- i,- i,- i,- i,-




Technology.

Policy Iteration i,- i,-i,- i,- Policy Evaluation: Compute V(pi) i,- i,- i,-




Policy Refinement: Greedily change action as per V(Pi) at next
i,- i,- i,- i,- i,- i,- i,- i,- i,- i,-



states


Why do Policy Iteration: PI_i often converges to PI* sooner than
i,- i,- i,- i,- i,- i,- i,- i,- i,- i,- i,-



V_PI to V_PI*
i,- i,-




- thus requires few iterations
i,- i,- i,- i,-




Deep Q-Learning i,- i,-i,- i,- - Q(s, a; w, b) = w_a^t * s + b_a
i,- i,- i,- i,- i,- i,- i,- i,- i,- i,-




MSE Loss := (Q_new(s, a) - (r + y*max_a(Q_old(s', a)))^2
i,- i,- i,- i,- i,- i,- i,- i,- i,-




- using a single Q function makes loss function unstable
i,- i,- i,- i,- i,- i,- i,- i,- i,-




--> use two Q-tables (NNs)
i,- i,- i,- i,-




- Freeze Q_old and update Q_new
i,- i,- i,- i,- i,-




- Set Q_old = Q_new at regular intervals
i,- i,- i,- i,- i,- i,- i,-

,Reinforcement learning Sequential decision making in an i,- i,-i,- i,- i,- i,- i,- i,- i,-



environment with evaluative feedback i,- i,- i,-




Environment: may be unknown, non-linear, stochastic and i,- i,- i,- i,- i,- i,- i,-



complex
Agent: learns a policy to map states of the environments to
i,- i,- i,- i,- i,- i,- i,- i,- i,- i,- i,-



actions
- seeks to maximize long-term reward
i,- i,- i,- i,- i,-




RL: Evaluative Feedback
i,- i,- i,-i,- i,- - Pick an action, receive a reward
i,- i,- i,- i,- i,- i,-




- No supervision for what the correct action is or would have
i,- i,- i,- i,- i,- i,- i,- i,- i,- i,- i,- i,-



been (unlike supervised learning)
i,- i,- i,-




RL: Sequential Decisions
i,- i,- i,-i,- i,- - Plan and execution actions over a
i,- i,- i,- i,- i,- i,- i,-



sequence of states i,- i,-




- Reward may be delayed, requiring optimization of future
i,- i,- i,- i,- i,- i,- i,- i,- i,-



rewards (long-term planning) i,- i,-




Signature Challenges in RL Evaluative Feedback: Need trial
i,- i,- i,- i,-i,- i,- i,- i,- i,- i,-



and error to find the right action
i,- i,- i,- i,- i,- i,-




Delayed Feedback: Actions may not lead to immediate reward
i,- i,- i,- i,- i,- i,- i,- i,-

, Non-stationarity: Data distribution of visited states changes when i,- i,- i,- i,- i,- i,- i,- i,-



the policy changes
i,- i,-




Fleeting Nature: of online data (may only see data once)
i,- i,- i,- i,- i,- i,- i,- i,- i,-




MDP i,-i,- i,- Framework underlying RL i,- i,-




S: Set of states
i,- i,- i,-




A: Set of actions
i,- i,- i,-




R: Distribution of Rewards
i,- i,- i,-




T: Transition probabiliity
i,- i,-




y: Discount property
i,- i,-




Markov Property: Current state completely characterizes state of
i,- i,- i,- i,- i,- i,- i,- i,-



the environment
i,-




RL: Equations relating optimal quantities
i,- i,- i,- i,- i,-i,- i,- 1. V*(S) =
i,- i,- i,-



max_a(Q*(s, a) i,-




2. PI*(s) = argmax_a(Q*(s, a)
i,- i,- i,- i,-




V*(S) i,-i,- i,- max_a (sum_(s') { p(s'|s, a) [r(s, a) + yV*(s')] } )
i,- i,- i,- i,- i,- i,- i,- i,- i,- i,-

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
AcademiaExpert Chamberlain College Of Nursing
View profile
Follow You need to be logged in order to follow users or courses
Sold
1708
Member since
5 year
Number of followers
762
Documents
4001
Last sold
1 day ago
EXAMS, STUDY GUIDES, ESSAYS, NOTES & GOOD GRADES

Hello, my name is Archie. I am an experienced tutor and I am here to provide you with all your study solutions ranging from exams, study guides, essays, notes and just to make school a little bit easier for you. Engage me if you have any questions about your course and I will swiftly and gladly assist. Good luck with studying and all the best going forward.

3.8

433 reviews

5
200
4
83
3
75
2
25
1
50

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions