CSE 258: Web Mining and Recommender Systems

Instructor: Julian McAuley (jmcauley@eng.ucsd.edu), CSE 4102

Fall 2018, Monday/Wednesday 18:30-19:50, Peterson Hall 108



"All that befalls you is part of the great Web."
-Marcus Aurelius


CSE 258 is a graduate course devoted to current methods for recommender systems, data mining, and predictive analytics. No previous background in machine learning is required, but all participants should be comfortable with programming (all example code will be in Python), and with basic optimization and linear algebra.

The course meets twice a week on Monday/Wednesday evenings, starting October 1. Meetings are in Peterson Hall 108.

There is no textbook for the course, though chapter references will be provided from Pattern Recognition and Machine Learning (Bishop), and from Charles Elkan's 2013 course notes.

Office hours: I'll hold office hours on Tuesdays 9:30-13:00 in CSE 4102. The course TAs will hold additional office hours as follows:

DayTimeLocation
Monday8:30am - 10:00amB240A
Monday1:00pm - 2:30pmB250A
Friday1:30pm - 3:30pmB215
Friday2:00pm - 4:00pmRady 4N128

For other discussions see the course's Piazza page.

Grading: All reports will be submitted via gradescope, and (except where the spec states otherwise) are expected to be completed individually. Your lowest (of four) homework scores will be discarded (or you are welcome not to submit one of the homeworks).

Part 1: Methods

WeekTopicsFilesReferencesSlidesPodcastHomework
1 (Oct 1/3) Supervised Learning: Regression
  • Least-squares regression
  • Overfitting & regularization
  • Training, validation, and testing
50k beer reviews
non-alcoholic beer reviews
week1.py
Bishop ch.3
Elkan ch.3,6
introduction & outline
lecture 1 (w/ annotations)
lecture 2 (w/ annotations)
lecture 1 (from FA17)
lecture 2
Homework 1
due Oct 15
2 (Oct 8/10) Supervised Learning: Classification
  • Logistic regression
  • SVMs
  • Multiclass & multilabel classification
  • How to evaluate classifiers
50k book descriptions
5k book cover images
week2.py
Bishop ch.4
Elkan ch.5,8
lecture 3 (w/ annotations)
lecture 4 (w/ annotations)
lecture 3
lecture 4
3 (Oct 15/17) Dimensionality Reduction & Clustering
  • Singular value decomposition & PCA
  • K-means & hierarchical clustering
  • Community detection
facebook ego network
week3.py
Bishop ch.9
Elkan ch.13
lecture 5 (w/ annotations)
lecture 6 (w/ annotations)
case study: reddit
lecture 5
lecture 6
Homework 2
due Oct 29

Part 2: Applications

WeekTopicsFilesReferencesSlidesPodcastHomework
4 (Oct 22/24) Recommender Systems
  • Latent-factor models
  • Collaborative filtering
Elkan ch.11 lecture 7 (w/ annotations)
assignment 1
lecture 7
Assignment 1
due Nov 19
5 (Oct 29/Oct 31) Text Mining
  • Sentiment analysis
  • Bags-of-words
  • TFIDF
  • Stopwords, stemming, and topic models
week5.py
Elkan ch.12 Homework 3
due Nov 14
6 (Nov 5/7) MIDTERM
  • Prep (Monday)
  • Midterm (Wednesday)
sp15 midterm (CSE190)
fa15 midterm (CSE190)
fa15 midterm (CSE255)
wi17 midterm (CSE158)
wi17 midterm (CSE258)
fa17 midterm (CSE158)
fa17 midterm (CSE258)
Assignment 2
due Dec 3
7/8 (Nov 14/19) Network Analysis
  • NO LECTURE Nov 12 (Veteran's day)
  • Power-laws and small-worlds
  • Random graph models
  • triads and weak ties
  • HITS and PageRank
  • NO LECTURE Nov 21 (for Thanksgiving)
Elkan ch.14
Easley & Kleinberg
Homework 4
due Nov 26
9 (Nov 26/28) Online advertising
  • Matching & marriage problems
  • AdWords
  • Bandit algorithms
tensorflow.py
Mining Massive Datasets
10 (Dec 3/5) Modeling Temporal and Sequence Data
  • Sliding windows and autoregression
  • Temporal dynamics in recommender systems
  • Temporal dynamics in text and social networks
week10.py