Exam (elaborations)

IE—456/556 & EEE-448/548 %% Reinforcement Learning and Dynamic Programming Final Exam - Summer 2022

Rating

Sold

Pages

Grade

A+

Uploaded on

25-09-2025

Written in

2025/2026

IE—456/556 & EEE-448/548 %% Reinforcement Learning and Dynamic Programming Final Exam - Summer 2022 Duration: 150 minutes Name Surname: Bilkent ID: Signature: Q1: Pacman Bonus Level! o o o (€] 1 2 3 4 o) Pacman is in a bonus level! With no ghosts around, he can eat as many dots as he wants. He is in the 5 x 1 grid shown above, where the cells are numbered from left to right, that is, s € {1,...,5}. In cells 1 through 4, the actions available are to move Right (R) or to Fly (F') out of the bonus level. The action Right deterministically lands Pacman in the cell to the right (and he eats the dot there), while the Fly action deterministically lands him in a terminal state and ends the game. From cell 5, Fly is the only action. Eating a dot gives a reward of +10, while flying out gives a reward of +20. 4 2 =§ (a) (4 pts) How many deterministic policies are there in the above MDP? Consider the following policies for 0 < i < 4: 7;(s) = R if s <4, F' otherwise. (b) ($2pts) Find the value functions of vx, (1), vr,(1), and v,(1) for the discount of v = 1, and fill out the table. Show your work. Uny (1) 20 Uy (1) 4’6 (D | 40 Vg (1) =20+ 8(0) =20 Do, (1) = 10+800)+ ¥7(29) =40 4 og () =16+ % 10) 4 50 +T(0)T & (20) = 60

Show more Read less

Institution

Revision

Course

Revision

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Written for

Institution: Revision
Course: Revision

Document information

Uploaded on: September 25, 2025
Number of pages: 8
Written in: 2025/2026
Type: Exam (elaborations)
Contains: Questions & answers

Subjects

ie456556 eee 448548 reinforcement learn

Content preview

IE—456/556 & EEE-448/548
%\%\\ Reinforcement Learning and Dynamic Programming
Final Exam - Summer 2022
Duration: 150 minutes

Name Surname: Bilkent ID: Signature:

Q1: Pacman Bonus Level!

o o o (€]

1 2 3 4 o)
Pacman is in a bonus level! With no ghosts around, he can eat as many dots as he wants. He is in
the 5 x 1 grid shown above, where the cells are numbered from left to right, that is, s € {1,...,5}.
In cells 1 through 4, the actions available are to move Right (R) or to Fly (F') out of the bonus
level. The action Right deterministically lands Pacman in the cell to the right (and he eats the
dot there), while the Fly action deterministically lands him in a terminal state and ends the game.
From cell 5, Fly is the only action. Eating a dot gives a reward of +10, while flying out gives a
reward of +20.

(a) (4 pts) How many deterministic policies are there in the above MDP?
4
2 =\§
Consider the following policies for 0 < i < 4: 7;(s) = R if s <4, F' otherwise.

(b) ($2pts) Find the value functions of vx, (1), vr,(1), and v,(1) for the discount of v = 1, and
fill out the table. Show your work.

Uny (1) 20
Uy (1) 4’6
(D | 40

Vg (1) =20+ 8(0) =20
Do,(1) = 10+800)+ ¥7(29) =40 4
og () =16+ % 10) 4 50 +T(0)T & (20) = 60

, (@) (10 pts) For what ranges of v, m4 is the optimal policy (that is, my is strictly better than
o, M1, T, and 7m3)?

4
Ty
G
t F
0410 +105+I0+26%
P 2

Ty -5 |04 |0¥F|6T 420
[/ SY— \0+\025+z@\)}

[ — 10+ 208

x, —> 20

v, W0 7 Vg () we need o have
For
\ e x m x Ve \0 +\ 02 $H 07f+zoz’ =5
\0 +\ 0F +\ 0 8 +
A
For Ve ()7 VL) uk reed b‘/z
For \/72(\) Y \/?L\) we need 87V

For VL) 7Vp ) e need 87Y%

So, for Y7, we have 07*(\)>u130)>\/£l)>\/1\0)>\£§\)
o K25 g ’f‘K‘Z Oﬁvﬁ"v‘vﬂ/{ &;0/,‘%.
B')""J"

$7.99

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

Abbyy01

3.5

(13)

Get to know the seller

Abbyy01 Exam Questions

View profile

Sold

Member since

3 year

Number of followers

Documents

1121

Last sold

4 weeks ago

3.5

13 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller Abbyy01. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $7.99. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 40945 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

IE—456/556 & EEE-448/548 %% Reinforcement Learning and Dynamic Programming Final Exam - Summer 2022

Written for

Document information

Subjects

Content preview

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?