DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
UNIVERSITY OF CALIFORNIA, SAN DIEGO

CSE 291: Statistical Learning

Winter 2005


OVERVIEW

CSE 291 is a graduate lecture course devoted to learning methods based on statistics.  The course will cover mainly mathematical concepts and results, but also some algorithms and their analysis.

CSE 291 is open to M.S. and Ph.D. students in computer science, bioinformatics, cognitive science, and related fields.  The course is complementary to other UCSD courses such as Cognitive Science 260, Math 283 (Statistical Methods in Bioinformatics), and ECE 285 (also entitled Statistical Learning).  Students are welcome to take any or all of these courses.  Unlike CSE 254, which will be offered in Spring 2005, CSE 291 is a lecture course.

The prerequisite for CSE 291 is an upper-division undergraduate course on probability and statistics, such as Math 183 or 186 at UCSD, or any graduate course on statistics, pattern recognition, or machine learning.  Students should take CSE 291 for four units, for a letter grade.  Use section id 518456 to register.  (Note that you can register even if Studentlink indicates that the section is full.)

The class meets on Tuesdays and Thursdays, from 2pm to 3:20pm.  Lectures are in York Hall, room 4080A.  York Hall is on the Revelle plaza, directly south of APM.  We are not meeting in Center Hall 224C.  Office hours are Mondays and Wednesdays from 3pm until 3:30, or later if necessary (except Monday Jan. 17).  Appointments are available most weekdays also.

LECTURE NOTES

Lecture notes for each class meeting will be published here on the class web page, which is http://www-cse.ucsd.edu/users/elkan/291.  Lecture notes from Winter 2004 are available.  Students will cooperate to produce detailed LaTeX lecture notes following these guidelines for scribes.

date
topics
LaTeX notes
January 4
Reasoning (probability theory) vs. learning (statistics), estimator vs. estimate, point estimation
here
January 6
Unbiasedness, mean squared error (MSE), minimum variance unbiased estimator (MVUE), suggested books
here
January 11
Intuitive concept of sufficiency, definition of (minimal) sufficient partition, of (minimal) sufficient statistic, Bernoulli example
here
January 13
Rao-Blackwell theorem intuition, nested expectations lemma, Jensen's inequality, start of Rao-Blackwell proof
here
January 18
Proof of three parts of Rao-Blackwell theorem, uniqueness of MVUEs, algorithm to obtain MVUEs
here
January 20
Definition of completeness, binomial example, Lehmann-Scheffe theorem, factorization theorem
here
January 25
Comments on answers to the first assignment--how to make reports and experiments compelling
here
January 27
Statement of the exponential family completeness theorem.  Principle of maximum likelihood (ML), the score function
here
February 1
Expectation of the score function, Cramer-Rao lower bound (CRLB) and when it is achieved, example
here
February 3
Example of achieving CRLB, informal hypothesis-testing, large-sample ML, consistency and efficiency
here
February 8
Weak law of large numbers, central limit theorem, Taylor expansion of score function
here
February 10
Convergence in probability, convergence in distribution, proof of ML asymptotic efficiency.  Logic of hypothesis testing here
February 15
Power function, size and significance level, likelihood ratio tests (LRTs), t-test example
here
February 17
Chi-squared asymptotic distribution of LRT statistics, Pearson's chi-squared goodness-of-fit test

February 22
Feedback on Assignment 2, Pearson's statistic as an approximation of the LRT statistic, chi-squared tests for contingency tables

February 24
Linear regression: least squares, matrix solution, variance of parameter estimates, F test

March 1
Meaning of F statistics, stepwise selection. Multiple comparisons, Sidak, Bonferroni, Westfall-Young

March 3
MSE = bias2 + variance, shrinkage and regularization ideas, ridge regression


 

TEXTS AND TOPICS

Unfortunately no book is close enough to the contents of the course to be suitable as a required text.  The main books to be used are Statistical Inference by S. D. Silvey and The Elements of Statistical Learning: Data Mining, Inference, and Prediction by T. Hastie, R. Tibshirani, and J. H. Friedman.  Other books that are recommended include: Some specific topics that will likely be covered in CSE 291 include: The instructor is Charles Elkan, Professor.  Office hours will be announced, in AP&M room 4856.  If you are unable to attend office hours, feel free to send email to arrange an appointment, or telephone (858) 534-8897.
   

ASSIGNMENTS

There will be five homework assignments, due every second Tuesday in class.  Assignments will be worth 2/3 of the final grade, and the final examination will be worth 1/3.  Questions on the final exam will be similar to assignment questions, but easier.

Each assignment will involve mathematical reasoning and also programming in Matlab.  Students are encouraged to form study groups, to collaborate on solving the problems posed, and to use multiple books and outside resources.  However, each student must write up his or her solutions independently.  Your solutions should be written in good, concise English with all necessary diagrams, plots, and explanations.  You must use LaTeX or similar high-quality software for text processing.  On the due date, you should submit a stapled 8.5x11 printout in class.  Your submission must be stapled and must not be in any sort of binder.

The first assignment was due in class on Tuesday January 18.  Although this assignment is not easy, it uses only the basic knowledge of probability and statistics that is a prerequisite for this course.  The second assignment was due in class on Tuesday February 1.  The third assignment was due in class on Tuesday February 15.

The fourth assignment is due in class on Tuesday March 1.  You will need this hurricane data.  Please ask questions using http://www.quicktopic.com/29/H/t3sgTnDZkMUqp.

The fifth assignment is due at the time of the final exam, which has been scheduled by the registrar for Thursday March 17, from 3pm to 6pm.  Please ask questions using http://www.quicktopic.com/29/H/NcdrNkr7SUA.



Most recently updated on March 8, 2005 by Charles Elkan, elkan@cs.ucsd.edu