|CSE 291B. Unsupervised Learning
Professor: Lawrence Saul
Lectures: Tue/Thu 11:00 am - 12:20 pm
Location: Warren Lecture Hall 2209
Office hours: after class and/or by appointment.
Units: 2 or 4
The lectures in this course will survey leading algorithms for unsupervised learning and high dimensional data analysis. The first part of the course will cover probabilistic generative models of high dimensional data, including Gaussian mixture models, factor analysis, nonnegative matrix factorization, exponential family PCA, probabilistic latent semantic analysis, and latent Dirichlet allocation. The second part of the course will cover spectral methods for dimensionality reduction, including multidimensional scaling, Isomap, maximum variance unfolding, locally linear embedding, graph Laplacian methods, spectral clustering, and kernel PCA. The third part of the course will consist of student presentations on related topics of interest.
The course is aimed at graduate
students in machine learning and related fields. Students should have earned a high grade in a previous, related course, such as CSE 250A, CSE 250B, ECE 271A, or ECE 271B. The course will be taught by lecture in the same style as CSE 250A, though at a more advanced mathematical level.
Students may take the course for either two or four units of credit. For two units, students will be required to
give a 20-minute oral presentation on an assigned reading or research project. They may also be required to submit "scribe notes" (handwritten or typeset) on a small subset of lectures.
For four units, students will additionally be required to complete occasional (but challenging) problem sets.
- HW 1 (out Jan 29, due Feb 12)
- HW 2 (out Mar 3, due Mar 17)
|Tue Jan 6
||Course overview. Review of clustering: k-means algorithm, Gaussian mixture modeling.
|Thu Jan 8
||Review of linear dimensionality reduction: principal component analysis, factor analysis.
|Tue Jan 13
||EM algorithms for factor analysis, principal component analysis, and mixture of factor analyzers.
|Thu Jan 15
|| Nonnegative matrix factorization: cost functions and multiplicative updates.
||Tue Jan 20
|| Nonnegative matrix factorization: auxiliary functions and proofs of convergence.
|Thu Jan 22
||Exponential family PCA.
|Tue Jan 27
||Random projection trees: guest lecture by Yoav Freund
|Thu Jan 29
||Document modeling: bag-of-words representation, probabilistic latent semantic indexing, Dirichlet models.
|Tue Feb 3
||Latent Dirichlet allocation.
|Thu Feb 5
||Variational approximations for inference.
|Tue Feb 10
||Singular value decomposition, low-rank matrix approximations, multidimensional scaling.
|Thu Feb 12
||Manifold learning, Isomap algorithm.
|Tue Feb 17
||Nystrom approximation; maximum variance unfolding (MVU).
|Thu Feb 19
||Spectral clustering, normalized cuts, graph partitioning.
|Tue Feb 24
||Laplacian eigenmaps, locally linear embedding (LLE).
|Thu Feb 26
||Low rank factorizations for MVU, kernel PCA; class evaluations.
|Tue Mar 3
||Boyko Kakaradov, Josh Lewis, Cheng-Xian Li, Adam Koerner
|Thu Mar 5
||Aditya Menon, Tim Mullen, Matt Rodriguez, Andrea Vattani, Priya Velu
|Tue Mar 10
||Pouya Bozorgmehr, Heejin Choi, Eric Christiansen, Timothy Fair
|Thu Mar 12
||Kristen Jackie, Han Kim, Justin Ma, Kei Ma, Vicente Malave
|Thu Mar 19
||Peter Shin, Il-Young Son, Andrew Stout, Walter Talbott, Robert Thomas, Alex Tsiatas, Kai Wang, Wensong Xu
Nonnegative matrix factorization
Exponential family PCA
M. Collins, S. Dasgupta, and R. E. Schapire (2002).
A generalization of principal component analysis to the exponential family. In T. G. Dietterich, S. Becker, and Z. Ghahramani (eds.), Advances in Neural Information Processing Systems 13, pages 617-624. MIT Press: Cambridge, MA.
I. Rish, G. Grabarnik, G. Cecchi, F. Pereira, and G. J. Gordon (2008).
Closed-form supervised dimensionality reduction with generalized linear models.
In Proceedings of the Twenty Fifth International Conference on Machine Learning (ICML-08), pages 832-839. Helsinki, Finland.
Multidimensional scaling and Nystrom approximation
Isomap and extensions
Maximum variance unfolding
K. Q. Weinberger and L. K. Saul (2006).
An introduction to nonlinear dimensionality reduction by maximum variance unfolding.
In Proceedings of the Twenty First National Conference on
Artificial Intelligence (AAAI-06), pages 1683-1686. Boston, MA.
J. Sun, S. Boyd, L. Xiao, and P. Diaconis (2006).
The fastest mixing Markov process on a graph and a connection to a maximum variance unfolding problem. SIAM Review 48(4):681-699.
K. Q. Weinberger, F. Sha, Q. Zhu, and L. K. Saul (2007).
Graph Laplacian regularization for large-scale semidefinite programming.
In B. Schoelkopf, J. Platt, and T. Hofmann (eds.), Advances in Neural Information Processing Systems 19, pages 1489-1496. MIT Press: Cambridge, MA.
Graph Laplacian methods
Locally linear embedding and related work
B. Schoelkopf, A. J. Smola, K. R. Mueller (1998).
Nonlinear component analysis as a kernel eigenvalue problem.
Neural Computation 10:1299-1319.
J. Hamm, D. D. Lee, S. Mika, and B. Schoelkopf (2004).
A kernel view of the dimensionality reduction of manifolds.
In Proceedings of the Twenty First International Confernence on Machine Learning (ICML-04), pages 369-376. Banff, Canada.
K. Q. Weinberger, F. Sha, and L. K. Saul (2004).
Learning a kernel matrix for nonlinear dimensionality reduction.
In Proceedings of the Twenty First International Confernence on Machine Learning
(ICML-04), pages 839-846. Banff, Canada.