100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Tentamen (uitwerkingen)

SO 2 Markov Decision Processes

Beoordeling
-
Verkocht
-
Pagina's
5
Cijfer
A
Geüpload op
30-10-2024
Geschreven in
2024/2025

SO 2 Markov Decision Processes What is a Markov decision process (MDP) and what are it's components? ️️An MDP is a model for sequential decision problems. It consists of: Decision epochs System states Actions Transition probabilities: depend only on present state and present action. Rewards What are decision epochs? what's our notation for them and what restrictions do we impose? ️️Decision epochs are the points of time when decisions are made and actions taken. T denotes the set of all. We consider models where T={t0,t1...} is a countable set and can be represented as N. Finite horizon T={1,....,N} finite set of integers. Infinite Horizon T= N. What are Actions? what's our notation for them and what restrictions do we impose? ️️Actions are the effects on the future behaviour of the system caused by the agents decisions. A denotes the set of all actions available to the decision maker and is called the action space. Yt is the random variable representing the action taken at t (even given all information decision can still be randomized). We only consider models where the action set is finite. What are states? what's our notation for them and what restrictions do we impose? ️️The state of a system is the information about the system, past and present which together with future action, enables us to predict (uniquely in a statistical sense - distribution) the system behaviour in the future S denotes the set of all states the system can be in. We restrict to the case when our state space is finite. Ns is the number of states. A(s)⊂ A is the set of all admissible actions when the system is in state s. What are the transition probabilities? what's our notation for them and what restrictions do we impose? ️️Pt( |s,a) are a paramatized family of PMFs on the state space; indexed by a state s (current) and action (current taken). Pt(s|s,a) is the probability of the process transitioning to state z at t+1 conditional on the system being in state s and action a being taken at time t. Transitions from one state to another obey a state-action Markov property assumption: P(Xt+1=st+1 | X0=s0, Y=a0,...Xt,Yt)=pt(st+1|st,at) essentially the future of the process given the present state of the process and the present action taken is independent of the past system states and actions taken. What are rewards? what's our notation for them and what restrictions do we impose? ️️Rewards are the immediate consequences of actions taken. rt(s,a)∈R is the reward recieved at time t if the system is in state s and the agent selects action a both at time t. What are decision rules? ️️Informally, A decision rule is a procedure for selecting an action in each state at the specified decision epoch. In the process of selecting an action to take the rule has access to the present state along with all past states and actions. Formally, a general decision rule is a distribution on Action set A. we consider 4 rule classes: History dependent randomized (HR) History dependent deterministic (HD) Memoryless randomized (MR) Memoryless deterministic (MD) The rule classes are related as follows: MR⊂HR⊃HD⊃MD Describe HR decision rules. ️️The most general class of HR rules are a family of probability distributions on the action set that are indexed based on decision epochs and the past-presnt of the underlying process and the past decision process. A particular (time t) rule qt in the HR class is specified by the following probability equations: P(Y0=a0 | X0=s0) = q0(a0 |s0) if t=0 P(Yt=at | X0=s0,....,Xt=st, Y0=a0,....,Yt-1=at-1) := qt(at | Ht, Ft-1) if t>0. qt(.|.) is called a decision probability of a time t decision rule. Describe HD decision rules ️️A time t decision rule is a deterministic function of the present-past of the underlying process and the past of the decision process. dt: (X0:t,Y0:t-1) -> A (S^t)x(A^t-1)->A dt is called the decision function at time t. This function constitutes the rule. Describe MR decision Rules ️️For a memoryless randomized decision rule, the choice of action does not depend on the past states or past actions, depending solely on the present state of the underlying system, that is: P(Yt=at | X0=s0,....,Xt=st, Y0=a0,....,Yt-1=at-1) = P(Yt=at|Xt=st) =: qt(at|st) A time t rule in this case is determined by specifying all decision probabilities qt(at|st) for all at, st in A and S. How the rule chooses in all possible scenarios.

Meer zien Lees minder
Instelling
SO 2 Ma-rk-ov Decision Processes
Vak
SO 2 Ma-rk-ov Decision Processes









Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Geschreven voor

Instelling
SO 2 Ma-rk-ov Decision Processes
Vak
SO 2 Ma-rk-ov Decision Processes

Documentinformatie

Geüpload op
30 oktober 2024
Aantal pagina's
5
Geschreven in
2024/2025
Type
Tentamen (uitwerkingen)
Bevat
Vragen en antwoorden

Onderwerpen

Voorbeeld van de inhoud

SO 2 Markov Decision Processes

What is a Markov decision process (MDP) and what are it's components? ✔️✔️An MDP is a model for
sequential decision problems.



It consists of:

Decision epochs

System states

Actions

Transition probabilities: depend only on present state and present action.

Rewards



What are decision epochs? what's our notation for them and what restrictions do we impose?
✔️✔️Decision epochs are the points of time when decisions are made and actions taken. T denotes
the set of all. We consider models where T={t0,t1...} is a countable set and can be represented as N.

Finite horizon T={1,....,N} finite set of integers.

Infinite Horizon T= N.



What are Actions? what's our notation for them and what restrictions do we impose? ✔️✔️Actions
are the effects on the future behaviour of the system caused by the agents decisions. A denotes the set
of all actions available to the decision maker and is called the action space. Yt is the random variable
representing the action taken at t (even given all information decision can still be randomized).



We only consider models where the action set is finite.



What are states? what's our notation for them and what restrictions do we impose? ✔️✔️The state of
a system is the information about the system, past and present which together with future action,
enables us to predict (uniquely in a statistical sense - distribution) the system behaviour in the future



S denotes the set of all states the system can be in. We restrict to the case when our state space is finite.
Ns is the number of states.

, A(s)⊂ A is the set of all admissible actions when the system is in state s.



What are the transition probabilities? what's our notation for them and what restrictions do we impose?
✔️✔️Pt( |s,a) are a paramatized family of PMFs on the state space; indexed by a state s (current) and
action (current taken).



Pt(s|s,a) is the probability of the process transitioning to state z at t+1 conditional on the system being
in state s and action a being taken at time t.



Transitions from one state to another obey a state-action Markov property assumption: P(Xt+1=st+1 |
X0=s0, Y=a0,...Xt,Yt)=pt(st+1|st,at)



essentially the future of the process given the present state of the process and the present action taken
is independent of the past system states and actions taken.



What are rewards? what's our notation for them and what restrictions do we impose? ✔️✔️Rewards
are the immediate consequences of actions taken. rt(s,a)∈R is the reward recieved at time t if the
system is in state s and the agent selects action a both at time t.



What are decision rules? ✔️✔️Informally, A decision rule is a procedure for selecting an action in each
state at the specified decision epoch. In the process of selecting an action to take the rule has access to
the present state along with all past states and actions.



Formally, a general decision rule is a distribution on Action set A. we consider 4 rule classes:



History dependent randomized (HR)

History dependent deterministic (HD)

Memoryless randomized (MR)

Memoryless deterministic (MD)



The rule classes are related as follows: MR⊂HR⊃HD⊃MD
€8,30
Krijg toegang tot het volledige document:

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten


Ook beschikbaar in voordeelbundel

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
CertifiedGrades Chamberlain College Of Nursing
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
141
Lid sinds
2 jaar
Aantal volgers
61
Documenten
8748
Laatst verkocht
3 weken geleden
High Scores

Hi there! Welcome to my online tutoring store, your ultimate destination for A+ rated educational resources! My meticulously curated collection of documents is designed to support your learning journey. Each resource has been carefully revised and verified to ensure top-notch quality, empowering you to excel academically. Feel free to reach out to consult with me on any subject matter—I'm here to help you thrive!

3,9

38 beoordelingen

5
21
4
6
3
2
2
3
1
6

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen