and lectures for the first week are below.
|| Exercises 1.1-1.5. Due: Friday January 20th, at the
beginning of class.
Programming assignment 1: as above. Due: Friday, January 20th at 11:59PM. Please email me a pdf of a report on your project (two pages is plenty, probably), with graphs in the pdf, along with your code, and instructions on how to run it. Thanks! Here is a discussion page for programming assignments.
Introduction to exploration and value learning.
Chapter 1 and 2.1-2.3
HAVEN'T BEEN GETTING CLASS EMAILS, Check your spam folder, and if it
isn't there, EMAIL ME NOW!!! email@example.com
2.4 through 2.11.
note I don't lecture about 2.8-2.10, but they are "good for you!"
Today I got as far as 3.4.
CSE room 3219
|Exercises (DUE MONDAY, January
1: Exercise 2.1 (should be easy - you can simulate it!).
1A: (Extra credit) Exercise 2.2 is using the softmax in the previous programming assignment
2: Exercise 2.3 - the sigmoid is 1/(1+e^[Q(a)/tau]) in this case.
3: Exercise 3.1
4: Exercise 3.5
5: Fill in the missing steps between the last two equations on slide 45. [Hint: it is helpful to use the guide to notation here.]
(Fun with programming:) This is simply a suggestion, not an assignment! Try to replicate Figure 2.5. I suggest initializing the reference reward optimistically.PROGRAMMING ASSIGNMENT 2,
Due MONDAY, 02/06/2012, 11:59PM:
Implement the algorithm for policy evaluation in Figure 4.1. Apply it to Example 4.1, using the equiprobable random policy (all actions equally likely). Do NOT make your algorithm so specific that this is the only example it can be applied to!
Then, using the same code but making minor modifications to it (as a separate function, obviously), implement Value iteration (section 4.4) and apply it to the same problem.
I will supply you with matlab code that should be adaptable for this assignment to read in the gridworld from files. These are from an old assignment and are likely to need updating for your task (but maybe not!). NOTE: There is a README.txt in this directory that doesn't show up in my browser when I click the link above. I don't know why. It is there, because if you paste "README.txt" into your browser, appending it to the above link, you get it...????
2, continued: finish
Chapter 3 (I hope!)
||Chapter 4.1 - 4.4|
||(We will discuss answers to the
homeworks above at the beginning of this class).
||Dynamic Programming Methods
||Chapter 4, completed
||Programming assignment deadline
pushed back to Monday!
|Chapter 5 up to 5.4|
||Programming Assignment Due!!
||Ch 5, continued
||5.5 to 5.8|
||Ch 5, completed, beginning
Chapter 6: TD methods
||HOMEWORK: Exercises 6.1-6.4 DUE
||Chapter 6, continued.
Due Wednesday, 02/22/2012, 11:59PM:
Implement SARSA (on policy learning) and Q-learning (off policy), and apply it to the cliff-walking problem. Reproduce Figure 6.13 (both parts: show both the learning and the policies learned).
|Chapter 6, cont.
essay by Terry Sejnowski
7: Elegibility Traces
||No Class: President's Day
||Finish Chapter 7.
||No Class: I will be at CoSyNe
||No Class: Still at CoSyNe
8: Function Approximation
||Read Chapter 8
||More function approximation
||Read Chapter 9
||No Class: In DC for PI's meeting
||Programming Assignment: Apply
Linear Function Approximation to the Mountain Car Problem. A matlab
(needs some work) interface is here. A
python version (also needs work) is here.
I suggest you use RBF's for the state space - or tile it.
||Read Chapter 10
11 (sutton slides)
||Read Chapter 11
Optional additional reading: Original sources for three of the examples:
Sutton (1998) (acrobot)
Crites & Barto (elevator)
Zhang & Deitterich (job-shop scheduling)
||Sutton et al., ICML 2009: GTD, GTD2, TDC|
The instructor is Professor
Gary Cottrell, whose office is CSE Building room 4130.
Feel free to send email to
an appointment, or telephone (858) 534-6640.
Most recently updated on January 10th, 2012 by Gary Cottrell, firstname.lastname@example.org