CSE 258 is a graduate course devoted to current methods for recommender systems, data mining, and predictive analytics. No previous background in machine learning is required, but all participants should be comfortable with programming (all example code will be in Python), and with basic optimization and linear algebra.

The course meets twice a week on Monday/Wednesday evenings, starting September 30. Meetings are in Galbraith Hall.

There is no textbook for the course, though chapter references will be provided from Pattern Recognition and Machine Learning (Bishop), and from Charles Elkan's 2013 course notes. Links are also provided to our Coursera Specialization, which covers similar material.

## Basic Info |

I'll hold office hours on **Tuesdays 9:30-13:00** in CSE 4102. The course TAs will hold additional office hours as follows:

**Monday**10:00-12:00: CSE B250A**Thursday**11:00-12:00: CSE B270A**Friday**10:30-12:30: CSE B240A

**Homework 1:**due Oct 14**Homework 2:**due Oct 28**Midterm:**Nov 6**Homework 3:**due Nov 13**Assignment 1:**due Nov 18**Homework 4:**due Nov 25**Assignment 2:**due Dec 3

- Each
**Homework**is worth 8%. Your lowest (of four) homework grades is dropped (or one homework can be skipped). - The
**Midterm**is worth 26%. - Each
**Assignment**is worth 25%. **Assignment 2**is a**group assignment**. All other assessment must be completed individually.- All assessments are due
**before**the Monday lecture on the due date. Late submissions are not accepted.

piazza page |

gradescope page |

last year's course webpage |

course outline slides |

1 | ## Supervised Learning: Regression |
---|

- Least-squares regression
- Overfitting and regularization
- Training, validation, and testing

- Bishop ch.3
- Elkan ch.3,6
- Instructions to access videos on coursera

- CSV and JSON files
- Reading CSV and JSON into Python
- Processing structured data in Python
- Extracting simple statistics from datasets
- Data filtering and cleaning
- Text and string processing in Python
- Time and date data
- Matrix processing and numpy
- Regression in Python
- Features from categorical data
- Features from temporal data
- Feature transformations
- Missing values
- Motivation behind the MSE
- Over and underfitting
- Setting up a codebase for evaluation and validation
- Evaluating a regularized model
- Evaluating classifiers for ranking
- Introduction to Training and Testing
- Validation
- Implementing a regularization pipeline in Python
- Guidelines on the implementation of predictive pipelines

- Workbook 1: CSV/TSV/JSON; extracting simple statistics; pandas; plotting
- Notebook from lecture

Files | week1.py | 50k beer reviews | non-alcoholic beer reviews |
---|

Lecture 1 | slides | + annotations | podcast |
---|

Lecture 2 | slides | + annotations | podcast |
---|

Homework | Homework 1 (due October 14) |
---|

2 | ## Supervised Learning: Classification |
---|

- Logistic regression
- SVMs
- Multiclass and multilabel classification
- How to evaluate classifiers

- Bishop ch.4
- Elkan ch.5,8
- More detailed derivation of the SVM (2018)
- Case study: reddit popularity

- Workbook 2: Classification; gradient descent
- Workbook 3: Classification diagnostics; training/testing
- Notebook from lecture

Files | week2.py | 50k book descriptions | 5k book cover images |
---|

Lecture 3 | slides | + annotations | podcast |
---|

Lecture 4 | slides | + annotations | podcast |
---|

3 | ## Dimensionality Reduction and Clustering |
---|

- Principal Component Analysis
- K-means & hierarchical clustering
- Community detection

- Bishop ch.9
- Elkan ch.13
- More detailed derivation of PCA (2018)

Files | week3.py | facebook ego network |
---|

Lecture 5 | slides | + annotations | podcast |
---|

Lecture 6 | slides | + annotations | podcast |
---|

Homework | Homework 2 (due October 28) |
---|

4 | ## Recommender Systems |
---|

- Collaborative Filtering
- Latent Factor Models

- Elkan ch.11

- Implementing a similarity-based recommender
- Using a similarity-based recommender for rating prediction

- Workbook 4: Recommender systems
- Notebook from lecture

- Read prediction
- Category prediction (158 only)
- Rating prediction (258 only)

Files | week4.py |
---|

Lecture 7 | slides | + annotations | podcast |
---|

Lecture 8 | slides | + annotations | podcast |
---|

Assignment | Assignment 1 (due November 18) | slides |
---|

5 | ## Text Mining |
---|

- Sentiment analysis
- Bags-of-words
- TF-IDF
- Stopwords, stemming, and low-dimensional representations of text

- Elkan ch.12
- Recommender Systems Datasets

Files | week5.py |
---|

Lecture 9 | slides | + annotations | podcast |
---|

Lecture 10 | slides | + annotations | podcast |
---|

Homework | Homework 3 (due November 13) |
---|

Assignment | Assignment 2 (due December 3) | slides |
---|

6 | ## Midterm |
---|

Midterm prep | Nov 4 |
---|

Midterm | Nov 6 |
---|

sp15 midterm (CSE190) | Solutions | Solution video (starts at 49:55) |

fa15 midterm (CSE190) | Solutions | Solution video (starts at 35:10) |

fa15 midterm (CSE255) | Solutions | Solution video (starts at 32:25) |

wi17 midterm (CSE158) | Solutions | Solution video (starts at 42:00) |

wi17 midterm (CSE258) | Solutions | Solution video (starts at 46:00) |

fa17 midterm (CSE158) | Solutions | Solution video (starts at 35:50) |

fa17 midterm (CSE258) | Solutions | Solution video (starts at 40:15) |

fa18 midterm (CSE158) | Solutions | Solution video (starts at 55:50) |

fa18 midterm (CSE258) | Solutions | Solution video (starts at 45:00) |

Midterm prep | slides | + annotations | podcast |
---|

7 | ## Tools and Libraries |
---|

No lecture | November 11 (Veteran's Day) |
---|

- Crawling and parsing data from the Web
- Manipulating time and date data
- Simple plotting with Matplotlib
- General-purpose gradient descent in Tensorflow

Files | week7.py |
---|

Lecture 11 | slides | + annotations | podcast |
---|

Homework | Homework 4 (due November 25) |
---|

8 | ## Data Mining in Social Networks |
---|

- Power-laws and small-worlds
- Random graph models
- Triads and weak ties
- HITS and PageRank

- Elkan ch.14
- Networks, Crowds, and Markets (book)

Lecture 12 | slides | + annotations | podcast |
---|

Lecture 13 | slides | + annotations | podcast |
---|

9 | ## State-of-the-art Recommender Systems |
---|

No lecture | November 27 (Thanksgiving) |
---|

**State-of-the-art Recommender Systems**- Bayesian Personalized Ranking
- Factorizing Personalized Markov Chains for Next-Basket Recommendation
- Personalized Ranking Metric Embedding for Next New POI Recommendation
**Real-world Applications**- Recommending product sizes to customers
- Playlist prediction via Metric Embedding

Lecture 14 | slides | + annotations | podcast |
---|

10 | ## Modeling Temporal and Sequence Data |
---|

- Sliding windows and autoregression
- Temporal dynamics in recommender systems
- Temporal dynamics in text and social networks

Files | week10.py |
---|

Lecture 15 | slides | + annotations | podcast |
---|

Lecture 16 | slides | + annotations | podcast |
---|