## Basic Info |

CSE 158 and 258 are undergraduate and graduate courses devoted to current methods for recommender systems, data mining, and predictive analytics. No previous background in machine learning is required, but all participants should be comfortable with programming (all example code will be in Python), and with basic optimization and linear algebra.

The course meets twice a week on Monday/Wednesday evenings, starting October 5. Meetings are livestreamed on twitch, but recordings will also be made available here.

There is no textbook for the course, though chapter references will be provided from Pattern Recognition and Machine Learning (Bishop), and from Charles Elkan's 2013 course notes. Links are also provided to our Coursera Specialization, which covers similar material.

Office hours (and instructions to access) for each class are posted to Piazza

**Homework 1:**due Oct 19 (Monday week 3)**Homework 2:**due Nov 2 (Monday week 5)**Midterm:**Nov 9 (take-home: released 6:30pm Monday Nov 9, due 6:30pm Tuesday Nov 10)**Homework 3:**due Nov 16 (Monday week 7)**Assignment 1:**due Nov 23 (Monday week 8)**Homework 4:**due Nov 30 (Monday week 9)**Assignment 2:**due Dec 7 (Monday week 10)

- Each
**Homework**is worth 8%. Your lowest (of four) homework grades is dropped (or one homework can be skipped). - The
**(take-home) Midterm**is worth 26%. - Each
**Assignment**is worth 25%. **Assignment 2**is a**group assignment**. All other assessment must be completed individually.- All assessments are due
**before**the Monday lecture on the due date. Late submissions are not accepted.

piazza page (CSE258) |

piazza page (CSE158) |

last year's course webpage |

intro and course outline slides |

1 | ## Supervised Learning: Regression |
---|

- Least-squares regression
- Overfitting and regularization
- Training, validation, and testing

- Bishop ch.3
- Elkan ch.3,6
- Instructions to access videos on coursera

- CSV and JSON files
- Reading CSV and JSON into Python
- Processing structured data in Python
- Extracting simple statistics from datasets
- Data filtering and cleaning
- Text and string processing in Python
- Time and date data
- Matrix processing and numpy
- Regression in Python
- Features from categorical data
- Features from temporal data
- Feature transformations
- Missing values
- Motivation behind the MSE
- Over and underfitting
- Setting up a codebase for evaluation and validation
- Evaluating a regularized model
- Evaluating classifiers for ranking
- Introduction to Training and Testing
- Validation
- Implementing a regularization pipeline in Python
- Guidelines on the implementation of predictive pipelines

- Workbook 1: CSV/TSV/JSON; extracting simple statistics; pandas; plotting
- Notebook from lecture

Files | week1.py | 50k beer reviews | non-alcoholic beer reviews |
---|

Lecture materials | lecture 1 video | lecture 2 video | slides | + annotations |
---|

2 | ## Supervised Learning: Classification |
---|

- Logistic regression
- SVMs
- Multiclass and multilabel classification
- How to evaluate classifiers

- Bishop ch.4
- Elkan ch.5,8
- More detailed derivation of the SVM (2018)
- Case study: reddit popularity

- Workbook 2: Classification; gradient descent
- Workbook 3: Classification diagnostics; training/testing
- Notebook from lecture

Files | week2.py | 50k book descriptions | 5k book cover images |
---|

Lecture materials | lecture 3 video | lecture 4 video | slides | + annotations |
---|

3 | ## Dimensionality Reduction and Clustering |
---|

- Principal Component Analysis
- K-means & hierarchical clustering
- Community detection

- Bishop ch.9
- Elkan ch.13
- More detailed derivation of PCA (2018)

Files | week3.py | facebook ego network |
---|

Lecture materials | lecture 5 video | lecture 6 video | slides | + annotations |
---|

4 | ## Recommender Systems |
---|

- Collaborative Filtering
- Latent Factor Models

- Elkan ch.11

- Implementing a similarity-based recommender
- Using a similarity-based recommender for rating prediction

- Workbook 4: Recommender systems
- RSTensorflow: Recommender systems in Tensorflow
- Notebook from lecture

- Play prediction
- Category prediction (158 only)
- Time played prediction (258 only)

Files | week4.py |
---|

Lecture materials | lecture 7 video | lecture 8 video | slides | + annotations |
---|

5 | ## Text Mining |
---|

- Sentiment analysis
- Bags-of-words
- TF-IDF
- Stopwords, stemming, and low-dimensional representations of text

- Elkan ch.12
- Recommender Systems Datasets

Files | week5.py |
---|

Lecture materials | lecture 9 video | lecture 10 video | slides | + annotations |
---|

6 | ## (Take-home) Midterm |
---|

Midterm | Nov 9 |
---|

- Midterm released 6:30pm Monday Nov 9
- Midterm due on gradescope 6:30pm Tuesday Nov 10

sp15 midterm (CSE190) | Solutions | Solution video (starts at 49:55) |

fa15 midterm (CSE190) | Solutions | Solution video (starts at 35:10) |

fa15 midterm (CSE255) | Solutions | Solution video (starts at 32:25) |

wi17 midterm (CSE158) | Solutions | Solution video (starts at 42:00) |

wi17 midterm (CSE258) | Solutions | Solution video (starts at 46:00) |

fa17 midterm (CSE158) | Solutions | Solution video (starts at 35:50) |

fa17 midterm (CSE258) | Solutions | Solution video (starts at 40:15) |

fa18 midterm (CSE158) | Solutions | Solution video (starts at 55:50) |

fa18 midterm (CSE258) | Solutions | Solution video (starts at 45:00) |

fa19 midterm (CSE158) |

fa19 midterm (CSE258) | Solutions | Solution video |

Lecture materials | lecture 11 video |
---|

6 | ## Tools and Libraries |
---|

No lecture | November 11 (Veteran's Day) |
---|

- Crawling and parsing data from the Web
- Manipulating time and date data
- Simple plotting with Matplotlib
- General-purpose gradient descent in Tensorflow

Files | week6.py |
---|

Lecture materials | slides | + annotations | lecture 12 video |
---|

Assignment | Assignment 2 (due December 7) | slides |
---|

7 | ## Data Mining in Social Networks |
---|

- Power-laws and small-worlds
- Random graph models
- Triads and weak ties
- HITS and PageRank

- Elkan ch.14
- Networks, Crowds, and Markets (book)

Lecture materials | slides | + annotations | lecture 13 video | lecture 14 video |
---|

8 | ## State-of-the-art Recommender Systems |
---|

No lecture | November 25 (Thanksgiving) | thanksgiving concert |
---|

**State-of-the-art Recommender Systems**- Bayesian Personalized Ranking
- Factorizing Personalized Markov Chains for Next-Basket Recommendation
- Personalized Ranking Metric Embedding for Next New POI Recommendation
**Real-world Applications**- Recommending product sizes to customers
- Playlist prediction via Metric Embedding

Lecture materials | slides | + annotations | lecture 15 video |
---|

9 | ## Online Advertising |
---|

- Matching & marriage problems

- AdWords
- Bandit algorithms

Lecture materials | slides | + annotations | lecture 16 video |
---|

10 | ## Modeling Temporal and Sequence Data |
---|

- Sliding windows and autoregression
- Temporal dynamics in recommender systems
- Temporal dynamics in text and social networks

Files | week10.py |
---|

Lecture materials | slides | + annotations | lecture 17 video | lecture 18 video |
---|