## Basic Info |

CSE 158 and 258 are undergraduate and graduate courses devoted to current methods for recommender systems, data mining, and predictive analytics. No previous background in machine learning is required, but all participants should be comfortable with programming (all example code will be in Python), and with basic optimization and linear algebra.

The course meets twice a week on Monday/Wednesday evenings, starting October 5. Meetings are livestreamed on twitch, but recordings will also be made available here.

There is no textbook for the course, though chapter references will be provided from Pattern Recognition and Machine Learning (Bishop), and from Charles Elkan's 2013 course notes. Links are also provided to our Coursera Specialization, which covers similar material.

**Homework 1:**due Oct 19 (Monday week 3)**Homework 2:**due Nov 2 (Monday week 5)**Midterm:**Nov 9 (take-home, due Monday week 6)**Homework 3:**due Nov 16 (Monday week 7)**Assignment 1:**due Nov 23 (Monday week 8)**Homework 4:**due Nov 30 (Monday week 9)**Assignment 2:**due Dec 7 (Monday week 10)

- Each
**Homework**is worth 8%. Your lowest (of four) homework grades is dropped (or one homework can be skipped). - The
**(take-home) Midterm**is worth 26%. - Each
**Assignment**is worth 25%. **Assignment 2**is a**group assignment**. All other assessment must be completed individually.- All assessments are due
**before**the Monday lecture on the due date. Late submissions are not accepted.

last year's course webpage |

intro and course outline slides |

1 | ## Supervised Learning: Regression |
---|

- Least-squares regression
- Overfitting and regularization
- Training, validation, and testing

- Bishop ch.3
- Elkan ch.3,6
- Instructions to access videos on coursera

- CSV and JSON files
- Reading CSV and JSON into Python
- Processing structured data in Python
- Extracting simple statistics from datasets
- Data filtering and cleaning
- Text and string processing in Python
- Time and date data
- Matrix processing and numpy
- Regression in Python
- Features from categorical data
- Features from temporal data
- Feature transformations
- Missing values
- Motivation behind the MSE
- Over and underfitting
- Setting up a codebase for evaluation and validation
- Evaluating a regularized model
- Evaluating classifiers for ranking
- Introduction to Training and Testing
- Validation
- Implementing a regularization pipeline in Python
- Guidelines on the implementation of predictive pipelines

- Workbook 1: CSV/TSV/JSON; extracting simple statistics; pandas; plotting
- Notebook from lecture

Files | week1.py | 50k beer reviews | non-alcoholic beer reviews |
---|

Lecture materials | slides | + annotations |
---|

2 | ## Supervised Learning: Classification |
---|

- Logistic regression
- SVMs
- Multiclass and multilabel classification
- How to evaluate classifiers

- Bishop ch.4
- Elkan ch.5,8
- More detailed derivation of the SVM (2018)
- Case study: reddit popularity

- Workbook 2: Classification; gradient descent
- Workbook 3: Classification diagnostics; training/testing
- Notebook from lecture

Files | week2.py | 50k book descriptions | 5k book cover images |
---|

Lecture materials | slides | + annotations |
---|

3 | ## Dimensionality Reduction and Clustering |
---|

- Principal Component Analysis
- K-means & hierarchical clustering
- Community detection

- Bishop ch.9
- Elkan ch.13
- More detailed derivation of PCA (2018)

Files | week3.py | facebook ego network |
---|

Lecture materials | slides | + annotations |
---|

4 | ## Recommender Systems |
---|

- Collaborative Filtering
- Latent Factor Models

- Elkan ch.11

- Implementing a similarity-based recommender
- Using a similarity-based recommender for rating prediction

- Workbook 4: Recommender systems
- Notebook from lecture

Files | week4.py |
---|

Lecture materials | slides | + annotations |
---|

5 | ## Text Mining |
---|

- Sentiment analysis
- Bags-of-words
- TF-IDF
- Stopwords, stemming, and low-dimensional representations of text

- Elkan ch.12
- Recommender Systems Datasets

Files | week5.py |
---|

Lecture materials | slides | + annotations |
---|

6 | ## (Take-home) Midterm due |
---|

Midterm due | Nov 9 |
---|

sp15 midterm (CSE190) | Solutions | Solution video (starts at 49:55) |

fa15 midterm (CSE190) | Solutions | Solution video (starts at 35:10) |

fa15 midterm (CSE255) | Solutions | Solution video (starts at 32:25) |

wi17 midterm (CSE158) | Solutions | Solution video (starts at 42:00) |

wi17 midterm (CSE258) | Solutions | Solution video (starts at 46:00) |

fa17 midterm (CSE158) | Solutions | Solution video (starts at 35:50) |

fa17 midterm (CSE258) | Solutions | Solution video (starts at 40:15) |

fa18 midterm (CSE158) | Solutions | Solution video (starts at 55:50) |

fa18 midterm (CSE258) | Solutions | Solution video (starts at 45:00) |

6 | ## Tools and Libraries |
---|

No lecture | November 11 (Veteran's Day) |
---|

- Crawling and parsing data from the Web
- Manipulating time and date data
- Simple plotting with Matplotlib
- General-purpose gradient descent in Tensorflow

Files | week6.py |
---|

Lecture materials | slides | + annotations |
---|

7 | ## Data Mining in Social Networks |
---|

- Power-laws and small-worlds
- Random graph models
- Triads and weak ties
- HITS and PageRank

- Elkan ch.14
- Networks, Crowds, and Markets (book)

Lecture materials | slides | + annotations |
---|

8 | ## State-of-the-art Recommender Systems |
---|

No lecture | November 27 (Thanksgiving) |
---|

**State-of-the-art Recommender Systems**- Bayesian Personalized Ranking
- Factorizing Personalized Markov Chains for Next-Basket Recommendation
- Personalized Ranking Metric Embedding for Next New POI Recommendation
**Real-world Applications**- Recommending product sizes to customers
- Playlist prediction via Metric Embedding

Lecture materials | slides | + annotations |
---|

9 | ## Online Advertising |
---|

- Matching & marriage problems

- AdWords
- Bandit algorithms

Lecture materials | slides | + annotations |
---|

10 | ## Modeling Temporal and Sequence Data |
---|

- Sliding windows and autoregression
- Temporal dynamics in recommender systems
- Temporal dynamics in text and social networks

Files | week10.py |
---|

Lecture materials | slides | + annotations |
---|