CS 234 assignment 2-ALL ANSWERS 100% CORRECT latest Jan 2022
Due date: 2/06 (Wed) 11:59 PM (23:59) PST These questions require thought, but do not require long answers. Please be as concise as possible. We encourage students to discuss in groups for assignments. We ask that you abide by the university Honor Code and that of the Computer Science department. If you have discussed the problems with others, please include a statement saying who you discussed problems with. Failure to follow these instructions will be reported to the Oce of Community Standards. We reserve the right to run a fraud-detection software on your code. Please refer to website, Academic Collaboration and Misconduct section for details about collaboration policy. Please review any additional instructions posted on the assignment page. When you are ready to submit, please follow the instructions on the course website. Make sure you test your code using the provided commands and do not edit outside of the marked areas. You'll need to download the starter code and ll the appropriate functions following the instructions from the handout and the code's documentation. Training DeepMind's network on Pong takes roughly 12 hours on GPU, so please start early! (Only a completed run will recieve full credit) We will give you access to an Azure GPU cluster. You'll nd the setup instructions on the course assignment page. Introduction In this assignment we will implement deep Q learning, following DeepMind's paper ([mnih2015human] and [mnih-atari-2013]) that learns to play Atari from raw pixels. The purpose is to understand the eec- tiveness of deep neural network as well as some of the techniques used in practice to stabilize training and achieve better performance. You'll also have to get comfortable with Tensor ow. We will train our networks on the Pong-v0 environment from OpenAI gym, but the code can easily be applied to any other environment. In Pong, one player wins if the ball passes by the other player. Winning a game gives a reward of 1, while losing gives a negative reward of -1. An episode is over when one of the two players reaches 21 wins. Thus, the nal score is between -21 (lost episode) or +21 (won episode). Our agent plays against a decent hard- coded AI player. Average human performance is
Written for
Document information
- Uploaded on
- December 30, 2021
- Number of pages
- 10
- Written in
- 2021/2022
- Type
- Exam (elaborations)
- Contains
- Questions & answers
Subjects
-
cs 234 assignment 2 all answers 100 correct 2022
-
cs 234 assignment 2 all answers 100 correct latest jan 2022