The readings
and lectures for the first week are below.
DATE  Homework 
TITLE 

01/11/2012 
Exercises 1.11.5. Due: Friday January 20th, at the
beginning of class. Programming assignment 1: as above. Due: Friday, January 20th at 11:59PM. Please email me a pdf of a report on your project (two pages is plenty, probably), with graphs in the pdf, along with your code, and instructions on how to run it. Thanks! Here is a discussion page for programming assignments. 
Lecture 1:
Introduction to exploration and value learning. 
Chapter 1 and 2.12.3 
01/18/2012 
See above.IF
YOU
HAVEN'T BEEN GETTING CLASS EMAILS, Check your spam folder, and if it
isn't there, EMAIL ME NOW!!! gary@ucsd.edu

Chapter
2.4 through 2.11.
note I don't lecture about 2.82.10, but they are "good for you!" Chapter 3.13.10 Today I got as far as 3.4. 

01/20/2012 12:301:50 CSE room 3219 
Exercises (DUE MONDAY, January
30th) 1: Exercise 2.1 (should be easy  you can simulate it!). 1A: (Extra credit) Exercise 2.2 is using the softmax in the previous programming assignment 2: Exercise 2.3  the sigmoid is 1/(1+e^[Q(a)/tau]) in this case. 3: Exercise 3.1 4: Exercise 3.5 5: Fill in the missing steps between the last two equations on slide 45. [Hint: it is helpful to use the guide to notation here.] (Fun with programming:) This is
simply a suggestion, not an assignment! Try to replicate Figure 2.5. I
suggest initializing the reference reward optimistically.
PROGRAMMING ASSIGNMENT 2,
Due MONDAY, 02/06/2012, 11:59PM: Implement the algorithm for policy evaluation in Figure 4.1. Apply it to Example 4.1, using the equiprobable random policy (all actions equally likely). Do NOT make your algorithm so specific that this is the only example it can be applied to! Then, using the same code but making minor modifications to it (as a separate function, obviously), implement Value iteration (section 4.4) and apply it to the same problem. I will supply you with matlab code that should be adaptable for this assignment to read in the gridworld from files. These are from an old assignment and are likely to need updating for your task (but maybe not!). NOTE: There is a README.txt in this directory that doesn't show up in my browser when I click the link above. I don't know why. It is there, because if you paste "README.txt" into your browser, appending it to the above link, you get it...???? 
Lecture
2, continued: finish
Chapter 3 (I hope!) 

01/2301/27 
NO CLASS!!!! 
Chapter 4.1  4.4  
01/30/2012 
(We will discuss answers to the
homeworks above at the beginning of this class). 
Dynamic Programming Methods Lecture 4: Chapter 4.14.4 
Chapter 4.5
 4.8 
02/01/2012 
Chapter 4, completed 

02/03/2012 
Programming assignment deadline
pushed back to Monday! 
Monte
Carlo Methods Chapter 5 
Chapter 5 up to 5.4 
02/06/2012 
Programming Assignment Due!! 
Ch 5, continued 
5.5 to 5.8 
02/08/2012 
Ch 5, completed, beginning
Chapter 6: TD methods 
Chapter 6
to 6.3 

02/10/2012 
HOMEWORK: Exercises 6.16.4 DUE
MONDAY, 02/13/2012 
Chapter 6, continued. 
6.46.5 
02/13/2012 
PROGRAMMING
ASSIGNMENT 3, Due Wednesday, 02/22/2012, 11:59PM: Implement SARSA (on policy learning) and Qlearning (off policy), and apply it to the cliffwalking problem. Reproduce Figure 6.13 (both parts: show both the learning and the policies learned). 
Chapter 6, cont. 
6.66.9 
02/15/2012 
Finish
Chapter 6. 
This
essay by Terry Sejnowski Chapter 77.3 

02/17/2012 
Chapter
7: Elegibility Traces 
Chapter 7 

02/20/2012 
No Class: President's Day 

02/22/2012 
Finish Chapter 7. 

02/24/2012 
No Class: I will be at CoSyNe 

02/27/2012 
No Class: Still at CoSyNe 

02/29/2012 
Chapter
8: Function Approximation 
Read Chapter 8 

03/02/2012 
More function approximation 

03/05/2012 
more FA 
Read Chapter 9 

03/07/2012 
No Class: In DC for PI's meeting 

03/09/2012 
Programming Assignment: Apply
Linear Function Approximation to the Mountain Car Problem. A matlab
(needs some work) interface is here. A
python version (also needs work) is here.
I suggest you use RBF's for the state space  or tile it. 
Ch. 9,
Ch. 10 
Read Chapter 10 
03/12/2012 
chapter
11 (sutton slides) 
Read Chapter 11 Optional additional reading: Original sources for three of the examples: Sutton (1998) (acrobot) Crites & Barto (elevator) Zhang & Deitterich (jobshop scheduling) 

03/14/2012 
Sutton et al., ICML 2009: GTD, GTD2, TDC 
The instructor is Professor
Gary Cottrell, whose office is CSE Building room 4130.
Feel free to send email to
arrange
an appointment, or telephone (858) 5346640.
Most recently updated on January 10th, 2012 by Gary Cottrell, gary@ucsd.edu