100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Exam (elaborations)

Quiz 5: CS 7643 Deep Learning | Questions with Verified Answers | 100% Correct| Latest 2025/2026 Update - Georgia Institute of Technology.

Rating
-
Sold
-
Pages
16
Grade
A+
Uploaded on
27-03-2025
Written in
2024/2025

Quiz 5: CS 7643 Deep Learning | Questions with Verified Answers | 100% Correct| Latest 2025/2026 Update - Georgia Institute of Technology. Quiz 5: CS 7643 Deep Learning | Questions with Verified Answers | 100% Correct| Latest 2025/2026 Update - Georgia Institute of Technology.

Show more Read less
Institution
CS 7643 Deep Learning
Course
CS 7643 Deep Learning










Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
CS 7643 Deep Learning
Course
CS 7643 Deep Learning

Document information

Uploaded on
March 27, 2025
Number of pages
16
Written in
2024/2025
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

  • quiz 5 cs 7643

Content preview

Quiz 5: CS 7643 Deep Learning | Questionsi,- i,- i,- i,- i,- i,- i,- i,-




with Verified Answers | 100% Correct| Latest
i,- i,- i,- i,- i,- i,- i,-




2025/2026 Update - Georgia Institute of i,- i,- i,- i,- i,- i,-




Technology.

Why do Policy Iteration: PI_i often converges to PI* sooner than
i,- i,- i,- i,- i,- i,- i,- i,- i,- i,- i,-



V_PI to V_PI*
i,- i,-




- thus requires few iterations
i,- i,- i,- i,-




Deep Q-Learning i,- i,-i,- i,- - Q(s, a; w, b) = w_a^t * s + b_a
i,- i,- i,- i,- i,- i,- i,- i,- i,- i,-




MSE Loss := (Q_new(s, a) - (r + y*max_a(Q_old(s', a)))^2
i,- i,- i,- i,- i,- i,- i,- i,- i,-




- using a single Q function makes loss function unstable
i,- i,- i,- i,- i,- i,- i,- i,- i,-




--> use two Q-tables (NNs)
i,- i,- i,- i,-




- Freeze Q_old and update Q_new
i,- i,- i,- i,- i,-




- Set Q_old = Q_new at regular intervals
i,- i,- i,- i,- i,- i,- i,-




Fitted Q-Iteration i,- i,-i,- i,- Algorithm to optimize MSE Loss on a fixed i,- i,- i,- i,- i,- i,- i,- i,-



dataset

,RL: How to Collect Data
i,- i,- i,- i,- i,-i,- i,- Challenge 1: Exploration vs i,- i,- i,- i,-



Exploitation


Challenge 2: Non iid, highly correlated data
i,- i,- i,- i,- i,- i,-




- This leads to high variance in gradients and inefficient learning
i,- i,- i,- i,- i,- i,- i,- i,- i,- i,-




- Experience Replay Addresses this:
i,- i,- i,- i,-




--> store (s, a, s', r) pairs and continually update episodes (older
i,- i,- i,- i,- i,- i,- i,- i,- i,- i,- i,- i,-



samples discarded) i,-




--> Train Q-Network on random mini batches of transitions from
i,- i,- i,- i,- i,- i,- i,- i,- i,- i,-



the replay memory instead of consecutive examples
i,- i,- i,- i,- i,- i,-




--> larger the buffer, lower the correlation
i,- i,- i,- i,- i,- i,-




Experience Replay - store (s, a, s', r) pairs and continually
i,- i,-i,- i,- i,- i,- i,- i,- i,- i,- i,- i,- i,-



update episodes (older samples discarded)
i,- i,- i,- i,-




Reinforcement learning Sequential decision making in an i,- i,-i,- i,- i,- i,- i,- i,- i,-



environment with evaluative feedback i,- i,- i,-




Environment: may be unknown, non-linear, stochastic and i,- i,- i,- i,- i,- i,- i,-



complex
Agent: learns a policy to map states of the environments to
i,- i,- i,- i,- i,- i,- i,- i,- i,- i,- i,-



actions

, - seeks to maximize long-term reward
i,- i,- i,- i,- i,-




RL: Evaluative Feedback
i,- i,- i,-i,- i,- - Pick an action, receive a reward
i,- i,- i,- i,- i,- i,-




- No supervision for what the correct action is or would have
i,- i,- i,- i,- i,- i,- i,- i,- i,- i,- i,- i,-



been (unlike supervised learning)
i,- i,- i,-




RL: Sequential Decisions
i,- i,- i,-i,- i,- - Plan and execution actions over a
i,- i,- i,- i,- i,- i,- i,-



sequence of states i,- i,-




- Reward may be delayed, requiring optimization of future
i,- i,- i,- i,- i,- i,- i,- i,- i,-



rewards (long-term planning) i,- i,-




Signature Challenges in RL Evaluative Feedback: Need trial
i,- i,- i,- i,-i,- i,- i,- i,- i,- i,-



and error to find the right action
i,- i,- i,- i,- i,- i,-




Delayed Feedback: Actions may not lead to immediate reward
i,- i,- i,- i,- i,- i,- i,- i,-




Non-stationarity: Data distribution of visited states changes when i,- i,- i,- i,- i,- i,- i,- i,-



the policy changes
i,- i,-




Fleeting Nature: of online data (may only see data once)
i,- i,- i,- i,- i,- i,- i,- i,- i,-




MDP i,-i,- i,- Framework underlying RL i,- i,-

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
AcademiaExpert Chamberlain College Of Nursing
View profile
Follow You need to be logged in order to follow users or courses
Sold
1708
Member since
5 year
Number of followers
762
Documents
4001
Last sold
1 day ago
EXAMS, STUDY GUIDES, ESSAYS, NOTES & GOOD GRADES

Hello, my name is Archie. I am an experienced tutor and I am here to provide you with all your study solutions ranging from exams, study guides, essays, notes and just to make school a little bit easier for you. Engage me if you have any questions about your course and I will swiftly and gladly assist. Good luck with studying and all the best going forward.

3.8

433 reviews

5
200
4
83
3
75
2
25
1
50

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions