CSE 190: Reinforcement Learning

UNIVERSITY OF CALIFORNIA, SAN DIEGO


Computer Science and Engineering 190: Reinforcement Learning
(Section ID: 746157)

Monday/Wednesday 3:30--4:50 PM Computer Science Building 2154

Makeup lectures: Fridays 12:30-1:50, CSE Building B260


Professor Gary Cottrell

Winter 2012



There are roughly four different kinds of learning: supervised, unsupervised, imitation, and reinforcement learning.

In supervised learning, the agent is told exactly what to do in each situation, and the goal is to generalize to new situations.
In unsupervised learning, the goal is to find a good representation of the data, under various metrics.
In imitation learning, the agent is shown how to do something, and must learn by trying to reproduce those actions.

Reinforcement learning is probably the closest kind of learning to "real life" and is one of the most well-established mechanisms of learning in the brain.

In reinforcement learning, the agent learns by interacting with the environment, trying different actions available to it, and receives evaluative feedback in the form of a reward signal. Thus, much learning is by trial and error. The goal of the agent, like life, is to maximize the agent's long-term expected rewards. In this course, we will go through the online textbook by Sutton & Barto, Reinforcement Learning: An Introduction. Since this is the first time I have taught this course, I am unsure of whether we will be able to get through the whole book and on to some more recent articles or not at this point.

The goal of the course is that students should be able to pick up a recent article about reinforcement learning, and understand 75% of it (about every 3 out of four words! ;-)).

There will be frequent programming assignments that will start small and build over the quarter, so that the student will have first hand knowledge of the material.

There will not be a midterm or a final.

Prerequisites: Linear algebra and vector calculus (e.g., Math 20A, 20E, 20F), probability and statistics (e.g., CSE103, Math 183), a good working knowledge of a reasonable programming language. Matlab or octave should be good choices.

Textbook:

The textbook is available online here: Reinforcement Learning: An Introduction
It is a good idea to keep a copy of the errata by your side as you read. One problem is that the errata refer to page numbers, and the html version doesn't have those. But usually you can figure it out from the context.
You can also buy a hardcopy, here.

Evaluation

Your grade will be based on your programming exercises and exercises from the text. We will not grade the exercises from the text; rather, we will discuss the answers in class. However, I do expect you to turn in your answers so that I know you are doing the work. Potentially, there may be a final project of your own choosing, but at the moment, I am planning on everyone doing the same programming projects.

Programming exercises

In order to better understand what is going on in the text, I will assign programming exercises that are closely tied to the text. For example, the first programming assignment is to replicate Figure 2.1 in the text.  I hope to get far enough along the first day so that it can be due next week.
One nice thing about matlab is that it is easy to make graphs.

For more information on matlab at UCSD and tutorials, follow this link to Serge Belongie's page on MatLab

Class participation

Class participation is just that. I expect you to be actively engaged in class, and to ask questions when you don't follow something.

The readings and lectures for the first week are below.


DATE Homework
TITLE
Reading (you should read this at least starting on the date assigned, hopefully finish by the next lecture)
01/11/2012
 Exercises 1.1-1.5. Due: Friday January 20th, at the beginning of class.

Programming assignment 1: as above. Due: Friday, January 20th at 11:59PM. Please email me a pdf of a report on your project (two pages is plenty, probably), with graphs in the pdf, along with your code, and instructions on how to run it. Thanks! Here is a discussion page for programming assignments.
Lecture 1: Introduction to exploration and value learning.

Chapter 1 and 2.1-2.3

01/18/2012
See above.

IF YOU HAVEN'T BEEN GETTING CLASS EMAILS, Check your spam folder, and if it isn't there, EMAIL ME NOW!!! gary@ucsd.edu


Chapter 2.4 through 2.11. note I don't lecture about 2.8-2.10, but they are "good for you!"
Chapter 3.1-3.10
Today I got as far as 3.4.
01/20/2012
12:30-1:50
CSE room 3219
Exercises (DUE MONDAY, January 30th)
1: Exercise 2.1 (should be easy - you can simulate it!).
1A: (Extra credit) Exercise 2.2 is using the softmax in the previous programming assignment
2: Exercise 2.3 - the sigmoid is 1/(1+e^[Q(a)/tau]) in this case.
3: Exercise 3.1
4: Exercise 3.5
5: Fill in the missing steps between the last two equations on slide 45. [Hint: it is helpful to use the guide to notation here.]

(Fun with programming:) This is simply a suggestion, not an assignment! Try to replicate Figure 2.5. I suggest initializing the reference reward optimistically.

PROGRAMMING ASSIGNMENT 2,
Due MONDAY, 02/06/2012, 11:59PM:

Implement the algorithm for policy evaluation in Figure 4.1. Apply it to Example 4.1, using the equiprobable random policy (all actions equally likely). Do NOT make your algorithm so specific that this is the only example it can be applied to!
Then, using the same code but making minor modifications to it (as a separate function, obviously), implement Value iteration (section 4.4) and apply it to the same problem.
I will supply you with matlab code that should be adaptable for this assignment to read in the gridworld from files. These are from an old assignment and are likely to need updating for your task (but maybe not!). NOTE: There is a README.txt in this directory that doesn't show up in my browser when I click the link above. I don't know why. It is there, because if you paste "README.txt" into your browser, appending it to the above link, you get it...????
Lecture 2, continued: finish Chapter 3 (I hope!)

01/23-01/27
NO CLASS!!!!

Chapter 4.1 - 4.4
01/30/2012
(We will discuss answers to the homeworks above at the beginning of this class).
Dynamic Programming Methods
Lecture 4:
Chapter 4.1-4.4

Chapter 4.5 - 4.8
02/01/2012

Chapter 4, completed

02/03/2012
Programming assignment deadline pushed back to Monday!
Monte Carlo Methods
Chapter 5

Chapter 5 up to 5.4
02/06/2012
Programming Assignment Due!!
Ch 5, continued
5.5 to 5.8
02/08/2012

Ch 5, completed, beginning Chapter 6: TD methods
Chapter 6 to 6.3
02/10/2012
HOMEWORK: Exercises 6.1-6.4 DUE MONDAY, 02/13/2012
Chapter 6, continued.
6.4-6.5
02/13/2012
PROGRAMMING ASSIGNMENT 3,
Due Wednesday, 02/22/2012, 11:59PM:
Implement SARSA (on policy learning) and Q-learning (off policy), and apply it to the cliff-walking problem. Reproduce Figure 6.13 (both parts: show both the learning and the policies learned).
Chapter 6, cont.
6.6-6.9
02/15/2012

Finish Chapter 6.
This essay by Terry Sejnowski
Chapter 7-7.3
02/17/2012

Chapter 7: Elegibility Traces
Chapter 7
02/20/2012
No Class: President's Day


02/22/2012

Finish Chapter 7.

02/24/2012
No Class: I will be at CoSyNe


02/27/2012
No Class: Still at CoSyNe


02/29/2012

Chapter 8: Function Approximation
Read Chapter 8
03/02/2012

More function approximation

03/05/2012

more FA
Read Chapter 9
03/07/2012
No Class: In DC for PI's meeting


03/09/2012
Programming Assignment: Apply Linear Function Approximation to the Mountain Car Problem. A matlab (needs some work) interface is here. A python version (also needs work) is here. I suggest you use RBF's for the state space - or tile it.
Ch. 9, Ch. 10
Read Chapter 10
03/12/2012

chapter 11 (sutton slides)
Read Chapter 11
Optional additional reading: Original sources for three of the examples:
Sutton (1998) (acrobot)
Crites & Barto (elevator)
Zhang & Deitterich (job-shop scheduling)
03/14/2012


Sutton et al., ICML 2009: GTD, GTD2, TDC

The instructor is Professor Gary Cottrell, whose office is CSE Building room 4130.  Feel free to send email to arrange an appointment, or telephone (858) 534-6640.


Most recently updated on January 10th, 2012 by Gary Cottrell, gary@ucsd.edu