CS292F (Spring 2021) Statistical Foundation of Reinforcement Learning

Syllabus [ link ]

Instructor: Prof. Yu-Xiang Wang

Lecture Section: Monday/Wednesday 1:00-2:40 pm Location: on Zoom (link will be sent to you via email.)

Piazza: https://piazza.com/ucsb/spring2021/cs292/home
Piazza is our main channel of communication. Questions should be posted here.

Gradescope: https://www.gradescope.com/courses/258384
This is where you submit your homeworks and project reports.

Office hours: Instructor: by appointment.

Course evaluation: 40% Homework, 40% Project, 10% for attendance / Participation. 10% for scribing.

Scribing: Please volunteer here, use this latex template


Acknowledgments The instructor sincerely thanks Wen Sun, Nan Jiang and Sham Kakade for sharing
the homeworks and other materials from CS 6789 at Cornell/University of Washington and CS 598 at UIUC.

Course Schedule / Scribed Notes

129-MarIntroduction and MDP basics [annotated, scribe,video]AJKS Ch 1.1-1.2HW0 out
231-MarMarkov Decision Processes I [annotated, scribe,video] AJKS Ch 1.3-1.5 
35-AprMarkov Decision Processes II [annotated, scribe,video] AJKS Ch 2HW1 out
47-AprMDP III and RL Algorithms I [annotated, scribe,video] SB Ch 5-6 
512-AprRL Algorithms II [annotated, scribe,video]SB Ch 9-10 HW0 due
614-Apr RL Algorithm III and Exploration I: MAB [annotated,video] SB Ch 13, AJKS Ch 9, AJKS Ch 5.1 
719-Apr Exploration I: MAB and Linear Bandits [annotated, scribe,video]AJKS Ch 5.1 Project proposal due
821-Apr Exploration II: Linear Bandits [annotated, scribe,video] AJKS Ch 5.2-5.3 
926-AprExploration III: Tabular MDPs [annotated, scribe,video] AJKS Ch 6 HW2 out / HW1 due
1028-AprExploration IV: Linear MDP [annotated, scribe,video]AJKS Ch 7 
113-MayWrap up exploration, Intro to Offline RL [annotated,video] AJKS 7.3-7.4, Lihong's perspective article.Midterm report due
125-MayOffline RL: OPE in Bandits and RL [annotated, scribe,video] (W., Agarwal, Dudik, 2016) (Jiang et al., 2016)  
1310-MayOffline RL: MIS and Fitted Q Iterations [annotated, scribe,video] (Yin and W., 2019) (Duan and Wang, 2019) HW2 due
1412-MayOffline RL: Uniform OPE [annotated,video](Yin et al., 2020)  
1517-MayOffline RL: Uniform OPE and optimal offline learning [annotated,video] (Yin et al., 2020) HW3 out
1619-MayOffline RL: Function approximation [annotated,video] AJKS Ch 15 
1724-MayOffice Hours / Project Consulation  
1826-MayOffice Hours / Project Consulation  
1931-MayNo lecture, Memorial Day  
202-JunMini-Symposium on Statistical RL HW3 due / Final project report due