100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Exam (elaborations)

CS 7643 Verified Multiple Choice and Conceptual Actual Frequently Tested Exam Questions With Reviewed 100% Correct Detailed Answers Guaranteed Pass!!Current Update!!

Rating
-
Sold
1
Pages
18
Grade
A+
Uploaded on
18-09-2025
Written in
2025/2026

CS7643 Verified Multiple Choice and Conceptual Actual Frequently Tested Exam Questions With Reviewed 100% Correct Detailed Answers Guaranteed Pass!!Current Update!! Q1. What is reinforcement learning (RL)? A. Learning from labeled datasets to minimize error. B. Sequential decision-making with evaluative feedback in an environment. C. Learning embeddings for words and graphs. D. Clustering unlabeled data points. Answer: B Q2. In RL, what does the agent do? A. Learns embeddings from data. B. Learns a policy to map states to actions to maximize long-term rewards. C. Provides supervision to the environment. D. Directly controls the reward function. Answer: B Q3. In RL, what distinguishes evaluative feedback from supervised learning feedback? A. Supervised learning provides rewards; RL provides labels. B. RL provides correct labels for each action. C. In RL, the agent only receives a reward but not the correct action. D. Both provide direct error signals for optimal actions. Answer: C Q4. Why is RL considered sequential decision-making? A. Each decision is independent of prior states. B. Actions have no long-term consequences. C. The agent must plan actions over sequences of states, sometimes with delayed rewards. D. It only applies to static datasets. Answer: C Q5. Which of the following is not a core challenge in RL? A. Evaluative feedback (trial-and-error learning). B. Delayed feedback (rewards not immediate). C. Non-stationarity (policy changes environment distribution). D. Full supervision (true labels provided at each step). Answer: D

Show more Read less










Whoops! We can’t load your doc right now. Try again or contact support.

Document information

Uploaded on
September 18, 2025
Number of pages
18
Written in
2025/2026
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

Content preview

CS7643 Verified Multiple Choice and
Conceptual Actual Frequently Tested Exam
Questions With Reviewed 100% Correct
Detailed Answers

Guaranteed Pass!!Current Update!!



Q1. What is reinforcement learning (RL)?
A. Learning from labeled datasets to minimize error.
B. Sequential decision-making with evaluative feedback in an environment.
C. Learning embeddings for words and graphs.
D. Clustering unlabeled data points.
Answer: B


Q2. In RL, what does the agent do?
A. Learns embeddings from data.
B. Learns a policy to map states to actions to maximize long-term rewards.
C. Provides supervision to the environment.
D. Directly controls the reward function.
Answer: B


Q3. In RL, what distinguishes evaluative feedback from supervised learning
feedback?
A. Supervised learning provides rewards; RL provides labels.
B. RL provides correct labels for each action.

,C. In RL, the agent only receives a reward but not the correct action.
D. Both provide direct error signals for optimal actions.
Answer: C


Q4. Why is RL considered sequential decision-making?
A. Each decision is independent of prior states.
B. Actions have no long-term consequences.
C. The agent must plan actions over sequences of states, sometimes with delayed
rewards.
D. It only applies to static datasets.
Answer: C


Q5. Which of the following is not a core challenge in RL?
A. Evaluative feedback (trial-and-error learning).
B. Delayed feedback (rewards not immediate).
C. Non-stationarity (policy changes environment distribution).
D. Full supervision (true labels provided at each step).
Answer: D


Q6. What does non-stationarity mean in RL?
A. Rewards are fixed regardless of state.
B. The distribution of visited states changes as the policy evolves.
C. The environment resets after each action.
D. The transition probabilities remain constant.
Answer: B


Q7. What is the Markov property in RL?
A. The next state depends on the entire history of states and actions.
B. The current state fully characterizes the environment.

, C. Rewards are always immediate and fixed.
D. Actions are chosen independently of states.
Answer: B


Q8. Which components define an MDP?
A. States, Actions, Rewards, Transition probabilities, Discount factor.
B. Loss function, Optimizer, Training data, Validation set.
C. Embeddings, Hidden states, Outputs, Weights.
D. Layers, Activations, Gradients, Loss.
Answer: A
Q9. The Bellman optimality equation for the state-value function V∗(s)V^*(s)V∗(s)
is:
A. V∗(s)=max⁡a∑s′p(s′∣s,a)[r(s,a)+γV∗(s′)]V^*(s) = \max_a \sum_{s'} p(s'|s, a) [r(s,
a) + \gamma V^*(s')]V∗(s)=maxa∑s′p(s′∣s,a)[r(s,a)+γV∗(s′)]
B. V∗(s)=∑s′p(s′∣s)r(s)V^*(s) = \sum_{s'} p(s'|s) r(s)V∗(s)=∑s′p(s′∣s)r(s)
C. V∗(s)=min⁡aQ(s,a)V^*(s) = \min_a Q(s, a)V∗(s)=minaQ(s,a)
D. V∗(s)=r(s)+V∗(s)V^*(s) = r(s) + V^*(s)V∗(s)=r(s)+V∗(s)
Answer: A


Q10. The Bellman optimality equation for the action-value function
Q∗(s,a)Q^*(s,a)Q∗(s,a) is:
A. Q∗(s,a)=∑s′p(s′∣s,a)[r(s,a)]Q^*(s,a) = \sum_{s'} p(s'|s, a) [r(s,a)]Q∗(s,a)=∑s′
p(s′∣s,a)[r(s,a)]
B. Q∗(s,a)=∑s′p(s′∣s,a)[r(s,a)+γmax⁡a′Q∗(s′,a′)]Q^*(s,a) = \sum_{s'} p(s'|s, a) [r(s,a)
+ \gamma \max_{a'} Q^*(s', a')]Q∗(s,a)=∑s′p(s′∣s,a)[r(s,a)+γmaxa′Q∗(s′,a′)]
C. Q∗(s,a)=V∗(s)+r(s)Q^*(s,a) = V^*(s) + r(s)Q∗(s,a)=V∗(s)+r(s)
D. Q∗(s,a)=max⁡sV(s)Q^*(s,a) = \max_s V(s)Q∗(s,a)=maxsV(s)
Answer: B

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
NURSINGDICTIONARY Chamberlain College Of Nursing
View profile
Follow You need to be logged in order to follow users or courses
Sold
243
Member since
2 year
Number of followers
87
Documents
2524
Last sold
1 week ago
NURSING ENCYCLOPEDIA

Our mission is to bring students and learners together and help you to get through your studies, courses and exams. Providing Well Revised Expert Information.

4.1

28 reviews

5
14
4
5
3
7
2
1
1
1

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions