CSE 190: Dahta Mining and Predictive Analytics

Autumn ('Fall') 2015, Monday/Wednesday 17:00-18:20, CENTR 216

CSE 190 is an undergraduate course devoted to current methods for data mining and predictive analytics. No previous background in machine learning is required, but all participants should be comfortable with programming (all example code will be in Python), and with basic optimization and linear algebra.

The course meets twice a week on Monday/Wednesday evenings, starting September 28. Meetings are in CENTR 216.

There is no textbook for the course, though chapter references will be provided from Pattern Recognition and Machine Learning (Bishop), and from Charles Elkan's 2013 course notes.

Office hours: I'll hold office hours on Tuesdays 9:30-11:30am in CSE 4102. The course TA (Long Jin) will hold office hours on Fridays 12:00-2:00pm in EBU-3b B260A. For other discussions see the course's Piazza page.

Part 1: Methods

WeekTopicsFilesReferencesSlidesPodcastHomework
1 (Sep 28/Sep 30) Supervised Learning: Regression
  • Least-squares regression
  • Overfitting & regularization
  • Training, validation, and testing
50k beer reviews
non-alcoholic beer reviews
week1.py
Bishop ch.3
Elkan ch.3,6
introduction & outline
lecture 1 (w/ annotations)
lecture 2 (w/ annotations)
lecture 1
lecture 2
Homework 1
due Oct 12
2 (Oct 5/7) Supervised Learning: Classification
  • Logistic regression
  • SVMs
  • Multiclass & multilabel classification
  • How to evaluate classifiers
50k book descriptions
5k book cover images
week2.py
Bishop ch.4
Elkan ch.5,8
lecture 3 (w/ annotations)
lecture 4 (w/ annotations)
case study
lecture 3
lecture 4
3 (Oct 12/14) Dimensionality Reduction & Clustering
  • Singular value decomposition & PCA
  • K-means & hierarchical clustering
  • Community detection
facebook ego network
week3.py
Bishop ch.9
Elkan ch.13
lecture 5 (w/ annotations)
lecture 6 (w/ annotations)
lecture 5
lecture 6
Homework 2
due Oct 26

Part 2: Applications

WeekTopicsFilesReferencesSlidesPodcastHomework
4 (Oct 19/21) Recommender Systems
  • Latent-factor models
  • Collaborative filtering
assignment 1 data
Elkan ch.11 lecture 7 (w/ annotations)
lecture 8 (w/ annotations)
assignment 1
lecture 7
lecture 8
Assignment 1
due Nov 17
5 (Oct 26/28) Text Mining (part 1)
  • Sentiment analysis
  • Bags-of-words
  • TFIDF
  • Stopwords, stemming, and topic models
week5.py
Elkan ch.12 lecture 9 (w/ annotations)
lecture 10 (midterm review) (w/ annotations)
lecture 9
lecture 10
Homework 3
due Nov 9
6 (Nov 2/4) MIDTERM (Nov 2)
and NO CLASS (Nov 4)
sp15 midterm Assignment 2
due Dec 1
7 (Nov 9) Text Mining (part 2)
and NO CLASS (Nov 11)
lecture 11 (w/ annotations)
lecture 11
Homework 4
due Nov 23
8 (Nov 16/18) Network Analysis
  • Power-laws and small-worlds
  • Random graph models
  • triads and weak ties
  • HITS and PageRank
Elkan ch.14
Easley & Kleinberg
lecture 12 (w/ annotations)
lecture 13 (w/ annotations)
assignment 2
lecture 12
lecture 13
9 (Nov 23/25) Online advertising
  • Matching & marriage problems
  • AdWords
  • Bandit algorithms
Mining Massive Datasets lecture 14 (w/ annotations)
lecture 15 (w/ annotations)
lecture 14
lecture 15
10 (Nov 30/Dec 2) Modeling Temporal and Sequence Data
  • Sliding windows and autoregression
  • Temporal dynamics in recommender systems
  • Temporal dynamics in text and social networks
week10.py
lecture 16 (w/ annotations)
lecture 17 (w/ annotations)
case study
lecture 16
lecture 17