CSE 291F. Unsupervised Learning
|
Spring 2010
Professor: Lawrence Saul
Lectures: Tue/Thu 11:00 am - 12:20 pm
Location: Cognitive Science Building 004
Office hours: after class and/or by appointment.
Units: 4
The lectures in this course will survey leading algorithms for unsupervised learning and high dimensional data analysis. The first part of the course will cover probabilistic/generative models of high dimensional data, including Gaussian mixture models, factor analysis, nonnegative matrix factorization, exponential family PCA, probabilistic latent semantic analysis, latent Dirichlet allocation, independent component analysis, and learning in deep neural networks. The second part of the course will cover spectral methods for dimensionality reduction, including multidimensional scaling, Isomap, maximum variance unfolding, locally linear embedding, graph Laplacian methods, spectral clustering, and kernel PCA.
The course is aimed at graduate
students in machine learning and related fields. Students should have earned a high grade in a previous, related course, such as CSE 250A, CSE 250B, ECE 271A, or ECE 271B. The course will be taught by lecture in the same style as CSE 250A, though at a more advanced mathematical level. Auditors are welcome, subject to available seating.
Enrollment is by permission of the instructor.
There will be three homework assignments (60%) and a final course project (40%). Students may also be required to submit "scribe notes" (handwritten or typeset) on a small subset of lectures.
Tue Mar 30 |
Course overview. Review of clustering: k-means algorithm, Gaussian mixture modeling. |
Thu Apr 01 |
Review of linear dimensionality reduction: principal component analysis, factor analysis. |
Tue Apr 06 |
EM algorithms for factor analysis, principal component analysis, and mixture of factor analyzers. |
Thu Apr 08 |
Nonnegative matrix factorization: cost functions and multiplicative updates. |
Tue Apr 13 |
Nonnegative matrix factorization: auxiliary functions and proofs of convergence. |
Thu Apr 15 |
Exponential family PCA. |
Tue Apr 20 |
Multiview dimensionality reduction. |
Thu Apr 22 |
Document modeling: bag-of-words representation, probabilistic latent semantic indexing, Dirichlet models. |
Tue Apr 27 |
Latent Dirichlet allocation. |
Thu Apr 29 |
Variational approximations for inference. |
Tue May 04 |
Independent component analysis: maximum likelihood, contrast functions. |
Thu May 06 |
Independent component analysis: fastICA, blind source separation. Example project presentation. |
Tue May 11 |
Singular value decomposition, low-rank matrix approximations, multidimensional scaling.
|
Thu May 13 |
Manifold learning, Isomap algorithm.
|
Tue May 18 |
Nystrom approximation; maximum variance unfolding (MVU).
|
Thu May 20 |
Spectral clustering, normalized cuts, graph partitioning.
|
Tue May 25 |
Laplacian eigenmaps, locally linear embedding (LLE).
|
Thu May 27 |
Low rank factorizations for MVU, kernel PCA; class evaluations.
|
Tue Jun 01 |
Student presentations: Avinash Atreya, Do-Kyun Kim, Mandar Dixit & Weixin Li, Maria Mubin, Matan Hofri.
|
Thu Jun 03 |
Student presentations: , Michael Zhang, Randy West, Son Pham, Sushma Bannur, Tomoki Tsuchida, Ying Li.
|
Probabilistic PCA
Nonnegative matrix factorization
Exponential family PCA
-
M. Collins, S. Dasgupta, and R. E. Schapire (2002).
A generalization of principal component analysis to the exponential family. In T. G. Dietterich, S. Becker, and Z. Ghahramani (eds.), Advances in Neural Information Processing Systems 13, pages 617-624. MIT Press: Cambridge, MA.
-
I. Rish, G. Grabarnik, G. Cecchi, F. Pereira, and G. J. Gordon (2008).
Closed-form supervised dimensionality reduction with generalized linear models.
In Proceedings of the Twenty Fifth International Conference on Machine Learning (ICML-08), pages 832-839. Helsinki, Finland.
Document modeling
Independent component analysis
Deep architectures
Multidimensional scaling and Nystrom approximation
Isomap and extensions
Maximum variance unfolding
K. Q. Weinberger and L. K. Saul (2006).
An introduction to nonlinear dimensionality reduction by maximum variance unfolding.
In Proceedings of the Twenty First National Conference on
Artificial Intelligence (AAAI-06), pages 1683-1686. Boston, MA.
J. Sun, S. Boyd, L. Xiao, and P. Diaconis (2006).
The fastest mixing Markov process on a graph and a connection to a maximum variance unfolding problem. SIAM Review 48(4):681-699.
K. Q. Weinberger, F. Sha, Q. Zhu, and L. K. Saul (2007).
Graph Laplacian regularization for large-scale semidefinite programming.
In B. Schoelkopf, J. Platt, and T. Hofmann (eds.), Advances in Neural Information Processing Systems 19, pages 1489-1496. MIT Press: Cambridge, MA.
Spectral clustering
Graph Laplacian methods
Locally linear embedding and related work
Kernel PCA
B. Schoelkopf, A. J. Smola, K. R. Mueller (1998).
Nonlinear component analysis as a kernel eigenvalue problem.
Neural Computation 10:1299-1319.
J. Hamm, D. D. Lee, S. Mika, and B. Schoelkopf (2004).
A kernel view of the dimensionality reduction of manifolds.
In Proceedings of the Twenty First International Confernence on Machine Learning (ICML-04), pages 369-376. Banff, Canada.
K. Q. Weinberger, F. Sha, and L. K. Saul (2004).
Learning a kernel matrix for nonlinear dimensionality reduction.
In Proceedings of the Twenty First International Confernence on Machine Learning
(ICML-04), pages 839-846. Banff, Canada.
|
|