This webpage is for an old version of the course; content may be out of date!

CSE 190: Dahta Mining and Predictive Analytics

Spring 2015, Tuesday/Thursday 18:30-19:50, CENTR 113

For the current version of this class, see here

CSE 190 is an undergraduate course devoted to current methods for data mining and predictive analytics. No previous background in machine learning is required, but all participants should be comfortable with programming (all example code will be in Python), and with basic optimization and linear algebra.

The course meets twice a week on Tuesday/Thursday evenings, starting March 31. Meetings are in CENTR 113.

There is no textbook for the course, though chapter references will be provided from Pattern Recognition and Machine Learning (Bishop), and from Charles Elkan's 2013 course notes.

Office hours: I'll hold office hours on Wednesday 1-3pm in CSE 4102. Additional office hourse will be held by Long Jin (Friday 12:30-2:30pm in EBU3B B275) and Pranay Kumar Myana (Monday 5-7pm in EBU3B B250A). For other discussions see the course's Piazza page.

Part 1: Methods

WeekTopicsFilesReferencesSlidesPodcastHomework
1 (Mar 31/Apr 2) Supervised Learning: Regression
  • Least-squares regression
  • Overfitting & regularization
  • Training, validation, and testing
50k beer reviews
non-alcoholic beer reviews
week1.py
Bishop ch.3
Elkan ch.3,6
lecture 1
lecture 2
case study: reddit
lecture 1
lecture 2
Homework 1
due April 14
2 (Apr 7/9) Supervised Learning: Classification
  • Logistic regression
  • SVMs
  • Multiclass & multilabel classification
  • How to evaluate classifiers
50k book descriptions
5k book cover images
week2.py
Bishop ch.4
Elkan ch.5,8
lecture 3
lecture 4
lecture 3
lecture 4
3 (Apr 14/16) Dimensionality Reduction & Clustering
  • Singular value decomposition & PCA
  • K-means & hierarchical clustering
  • Community detection
facebook ego network
week3.py
Bishop ch.9
Elkan ch.13
lecture 5
lecture 6
case study: social circes
lecture 5
lecture 6
Homework 2
due April 28
4 (Apr 2) Graphical Models & Interdependent Variables
  • Directed and undirected models
Bishop ch.8 lecture 7
lecture 7

Part 2: Applications

WeekTopicsFilesReferencesSlidesPodcastHomework
4/5 (Apr 23/28/30) Recommender Systems
  • Latent-factor models
  • Collaborative filtering
assignment 1 data
homework 3 data
baselines.py
Elkan ch.11 lecture 8
assignment 1
lecture 9
case study: beer experts
lecture 10 (midterm review)
lecture 8
lecture 9
lecture 10
Homework 3
due May 12
Assignment 1
due May 20
6 (May 5/7) MIDTERM (May 5)
and Text Mining (part 1)

week6.py
Elkan ch.12 lecture 11
assignment 2
lecture 11
Assignment 2
due June 2
reports
7 (May 12/14) Guest lecture (Manuel Gomez Rodriguez, May 12)
Miscellaneous stuff (May 14)
  • Assignment 1 hints and tips
  • Assignment 2
  • Midterm recap
homework 3.2 solution
guest lecture
lecture 12
guest lecture
lecture 12
Homework 4
due May 26
8 (May 19/21) Text Mining (part 2)
  • Sentiment analysis
  • Bags-of-words
  • TFIDF
  • Stopwords, stemming, and topic models
Elkan ch.12 lecture 13
lecture 14
lecture 13
lecture 14
9 (May 26/28) Network Analysis
  • Power-laws and small-worlds
  • Random graph models
  • triads and weak ties
  • HITS and PageRank
Elkan ch.14
Easley & Kleinberg
lecture 15
lecture 16
lecture 15
lecture 16
10 (Jun 2/4) Modeling Temporal and Sequence Data
  • Sliding windows and autoregression
  • Hidden Markov Models
  • Temporal dynamics in recommender systems
  • Temporal dynamics in text and social networks
week10.py
lecture 17
lecture 18
lecture 17
lecture 18