This webpage is for an old version of the course; content may be out of date!

CSE 158: Web Mining and Recommender Systems

Instructor: Julian McAuley (jmcauley@eng.ucsd.edu), CSE 4102

Fall 2017, Monday/Wednesday 17:00-18:20, Galbraith Hall



"All that befalls you is part of the great Web."
-Marcus Aurelius


CSE 158 is an undergraduate course devoted to current methods for recommender systems, data mining, and predictive analytics. No previous background in machine learning is required, but all participants should be comfortable with programming (all example code will be in Python), and with basic optimization and linear algebra.

The course meets twice a week on Monday/Wednesday evenings, starting October 2. Meetings are in Galbraith Hall.

There is no textbook for the course, though chapter references will be provided from Pattern Recognition and Machine Learning (Bishop), and from Charles Elkan's 2013 course notes.

Office hours: I'll hold office hours on Tuesdays 9:00-13:00 in CSE 4102. The course TAs will hold additional office hours from 10:00-13:00, Mondays and Fridays in CSE B275. For other discussions see the course's Piazza page.

Grading: All reports will be submitted via gradescope, and (except where the spec states otherwise) are expected to be completed individually. Your lowest (of four) homework scores will be discarded (or you are welcome not to submit one of the homeworks). You are also allowed a single "late day" for any report, i.e., if you submit one report late by one day there will be no penalty, but further late reports will not be graded.

Part 1: Methods

WeekTopicsFilesReferencesSlidesPodcastHomework
1 (Oct 2/4) Supervised Learning: Regression
  • Least-squares regression
  • Overfitting & regularization
  • Training, validation, and testing
50k beer reviews
non-alcoholic beer reviews
week1.py
Bishop ch.3
Elkan ch.3,6
introduction & outline
lecture 1 (w/ annotations)
lecture 2 (w/ annotations)
lecture 1
lecture 2
Homework 1
due Oct 16
2 (Oct 9/11) Supervised Learning: Classification
  • Logistic regression
  • SVMs
  • Multiclass & multilabel classification
  • How to evaluate classifiers
50k book descriptions
5k book cover images
week2.py
Bishop ch.4
Elkan ch.5,8
lecture 3 (w/ annotations)
lecture 4 (w/ annotations)
case study: reddit popularity
lecture 3
lecture 4
3 (Oct 16/18) Dimensionality Reduction & Clustering
  • Singular value decomposition & PCA
  • K-means & hierarchical clustering
  • Community detection
facebook ego network
week3.py
assignment 1 data
Bishop ch.9
Elkan ch.13
lecture 5 (w/ annotations)
lecture 6 (w/ annotations)
case study: social circes
lecture 5
lecture 6
Homework 2
due Oct 30

Part 2: Applications

WeekTopicsFilesReferencesSlidesPodcastHomework
4 (Oct 23/25) Recommender Systems
  • Latent-factor models
  • Collaborative filtering
Elkan ch.11 lecture 7 (w/ annotations)
lecture 8 (w/ annotations)
assignment 1
lecture 7
lecture 8
Assignment 1
due Nov 20
5 (Oct 30/Nov 1) Text Mining
  • Sentiment analysis
  • Bags-of-words
  • TFIDF
  • Stopwords, stemming, and topic models
week5.py
Elkan ch.12 lecture 9 (w/ annotations)
lecture 10 (w/ annotations)
assignment 2
lecture 9
lecture 10
Homework 3
due Nov 13
6 (Nov 6/8) MIDTERM (Nov 8)
  • Prep (Monday)
  • Exam (Wednesday)
sp15 midterm (CSE190)
fa15 midterm (CSE190)
fa15 midterm (CSE255)
wi17 midterm (CSE158)
wi17 midterm (CSE258)
midterm review (w/ annotations)
midterm prep
Assignment 2
due Dec 4
7 (Nov 13/15) Network Analysis
  • Power-laws and small-worlds
  • Random graph models
  • triads and weak ties
  • HITS and PageRank
Elkan ch.14
Easley & Kleinberg
lecture 11 (w/ annotations)
lecture 12 (w/ annotations)
lecture 11
lecture 12
Homework 4
due Nov 27
8 (Nov 20/22) Online advertising
  • Matching & marriage problems
  • AdWords
  • Bandit algorithms
tensorflow.py
Mining Massive Datasets lecture 13 (w/ annotations)
lecture 14 (w/ annotations)
lecture 13
lecture 14
9 (Nov 27/29) State-of-the-art Recommender Systems
  • Bayesian Personalized Ranking
  • Factorizing Personalized Markov Chains for Next-Basket Recommendation
  • Personalized Ranking Metric Embedding for Next New POI Recommendation
  • Translation-based Recommendation
Real-world Applications
  • Recommending product sizes to customers
  • Playlist prediction via Metric Embedding
  • Efficient Natural Language Response Suggestion for Smart Reply
  • Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences
lecture 15 and 16 (w/ annotations)
lecture 15
lecture 16
10 (Dec 4/6) Modeling Temporal and Sequence Data
  • Sliding windows and autoregression
  • Temporal dynamics in recommender systems
  • Temporal dynamics in text and social networks
week10.py
lecture 17 (w/ annotations)
lecture 18 (w/ annotations)
lecture 17
lecture 18