Project Types
Your project can be of three types -- a literature survey, an implementation project or a research project.
For a literature survey project, you are expected to read a few papers on a coherent topic, and present an overview of the literature in the area. Some suggestions for papers and topics are provided below. Keep in mind that the papers provided below are seed papers, and in all likelihood, you would need to read more papers than those on the list for each topic. For literature survey projects, I expect you to form your own opinion of the existing work -- how the literature fits together, where the gaps are, and so on. During your presentation, I reserve the right to ask you questions from any of the papers that you read.
For an implementation project, you are expected to implement some machine-learning algorithms and apply them on some datasets, with a view towards connecting them to the underlying theory. For example, you could try to compare the performance of some algorithm against theoretical performance bounds, to see the extent to which they agree. A very good type of implementation project is to test whether certain assumptions commonly made in theoretical analysis, actually hold for actual data. You should also provide a list of datasets that you plan to use.
A research project involves theoretical research. You could decide to attack an existing research problem; in this case, I expect you to explain how current approaches to the problem fall short, and how you would like to overcome these shortcomings. Alternatively, you could also come up with a new model, presumably an extension of earlier models. In this case, you need to motivate your model, explain why the existing models are inadequate, and how your model overcomes the issues. Your ideas and your approach should be novel; I expect to see a section in your report and your talk on a literature search which indicates that your model or your approach has not been tried before.
Keep in mind that research projects can be very hard; while you do not have to produce new results, you should provide a list of approaches that you tried, motivate these approaches, and explain why they didn't work.
If you decide to do an implementation project or a theoretical research project, please make an appointment to see me as soon as possible.
Project Milestones
The project will be done in groups of two to four students. Your project group may be different from your homework group.
The project will be graded based on a final report and an in-class presentation. There are four milestones associated with the project.
- Project Proposal. Send me an email with the names of your project partners, and a brief, one page description of the goals of your project. If you choose a literature survey project, your description should contain a tentative list of papers that you propose to read, and why these papers seem related. If you choose an implementation or a research project, your description should include a brief summary of the goals of your project -- for example, which algorithm do you plan to implement, and what hypothesis do you wish to test, what research problem do you wish to attack and why.
The project proposal is due Monday October 22, 5pm. If you need help selecting a project, please send me email and make an appointment to see me before the deadline.
- Midterm Progress Report. This is a one-page progress report. For literature survey projects, it should include the papers you have already read, and how they are related. It is very likely that you may wish to add or remove papers from your initial list based on the first few papers read; if there are any such revisions, please specify what they are and why you would like to make them. For implementation or research projects, write down a brief summary of the progress made and the results (both positive and negative) obtained so far. If you haven't obtained any concrete results, write down the approaches that you tried.
The progress report is due Monday November 12, 5pm by email. If you are stuck with your project, please make an appointment to see me before the deadline.
- Project Presentations. The project presentations will be held in class on Wednesday November 28 and Monday December 3 . You do not need to use slides for your presentation; if you do, please send me a pdf copy of your slides by December 4, noon for posting on the class website.
More details on the project presentations will be announced shortly.
- Final Report. The final report is due Monday December 10, 5pm by email. The final report must be typed should be no more than 10 pages including references in 11 point or larger font size. Hand-written reports will not be accepted. No late final reports will be accepted.
The final report will be evaluated on clarity as well as their technical quality. For literature survey projects, the report should include a section on your own interpretation of the existing literature -- how do you think the literature fits together, where do you think the gaps are, for example. For implementation projects, the report should include a section on how your results agree with the theory (or not), and why do you think this is the case. For research projects, the report should contain a section on the related work which explains how your problem fits into the existing literature, and should motivate the model and the approach that you used.
Project Ideas
Below are a list of topics, along with a couple of recent references on each. For more topic ideas, consult the recent proceedings of COLT, NIPS or ICML. For literature survery projects, I would expect you to read more papers than just these.
Classifiers that abstain
- Lihong Li, Michael L. Littman, Thomas J. Walsh, Alexander L. Strehl: Knows what it knows: a framework for self-aware learning. Machine Learning 82(3): 399-443 (2011)
- Amin Sayedi, Morteza Zadimoghaddam, Avrim Blum: Trading off Mistakes and Don't-Know Predictions. NIPS 2010
Privacy-preserving Classification
- Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhodnikova, Adam Smith: What Can We Learn Privately? SIAM J. Comput. 40(3): 793-826 (2011)
- Kamalika Chaudhuri, Claire Monteleoni, Anand Sarwate: Differentially Private Empirical Risk Minimization, JMLR 2011
- Kamalika Chaudhuri, Daniel Hsu: Sample Complexity Bounds for Differentially Private Classification, COLT 2011
Distributed Machine Learning
- Maria-Florina Balcan, Avrim Blum, Shai Fine, Yishay Mansour: Distributed Learning, Communication Complexity and Privacy, COLT (2012)
- Efficient Protocols for Distributed Classification and Optimization: Hal Daume III, Jeff M. Phillips, Avishek Saha, Suresh Venkatasubramanian, Arxiv, 2012
Privacy and Streaming Computation
- Cynthia Dwork, Moni Naor, Toniann Pitassi, Guy N. Rothblum, and Sergey Yekhanin: Pan-private Streaming Algorithms, ICS 2010
- Darakhshan Mir, S. Muthukrishnan, Aleksandar Nikolov, Rebecca N. Wright: Pan-private Algorithms via Statistics on Sketches, PODS 2011
Contextual Bandits
- John Langford and Tong Zhang: The epoch-greedy algorithm for multiarmed bandits with side information, NIPS 2007
- Miroslav Dudik, Daniel Hsu, Satyen Kale, Nikos Karampatziakis, John Langford, Lev Reyzin, and Tong Zhang: Efficient optimal learning for contextual bandits, UAI 2011
Selective Sampling
- Ran El-Yaniv, Enav Wiener: Agnostic selective classification, NIPS 2011
- Ran El-Yaniv, Enav Wiener: On the Foundations of Noise-free Selective Classification, JMLR 2010
Models of Clustering
- Maria-Florina Balcan, Avrim Blum, Santosh Vempala: A discriminative framework for clustering via similarity functions. STOC 2008: 671-680
- Maria-Florina Balcan, Avrim Blum, Anupam Gupta: Approximate clustering without the approximation. SODA 2009: 1068-1077
Spectral Learning
- Kamalika Chaudhuri, Sham M. Kakade, Karen Livescu, Karthik Sridharan: Multi-view clustering via canonical correlation analysis. ICML 2009
- Daniel Hsu, Sham M. Kakade, Tong Zhang: A spectral algorithm for learning Hidden Markov Models. J. Comput. Syst. Sci. 78(5): 1460-1480 (2012)
- Animashree Anandkumar, Dean P. Foster, Daniel Hsu, Sham M. Kakade, Yi-Kai Liu: Two SVDs Suffice: Spectral decompositions for probabilistic topic modeling and latent Dirichlet allocation CoRR abs/1204.6703: (2012)
MultiClass Classification
- Amit Daniely, Sivan Sabato, Shai Shalev-Shwartz: Multiclass Learning Approaches: A Theoretical Comparison with Implications, NIPS 2012
- Amit Daniely, Sivan Sabato, Shai Ben-David, Shai Shalev-Shwartz: Multiclass Learnability and the ERM Principle, COLT, 2011.