CSE 259: AI Seminar

Instructor: Julian McAuley (jmcauley@eng.ucsd.edu), CSE 4102

Autumn 2019, Monday 12:00-13:00, CSE (EBU-3b) 4140

CSE 259 is a weekly seminar course that covers current topics in Machine Learning and Artificial Intelligence.

Week 1 (September 30)

Audra McMillan (Boston University / Northeastern University)


Hypothesis testing plays a central role in statistical inference, and is used in many settings where privacy concerns are paramount. In this talk we'll address a basic question about privately testing simple hypotheses: given two distributions P and Q, and a privacy level ε, how many i.i.d. samples are needed to distinguish P from Q subject to ε-differential privacy, and what sort of tests have optimal sample complexity? Specifically, we'll characterize this sample complexity up to constant factors in terms of the structure of P and Q and the privacy level ε, and show that this sample complexity is achieved by a certain randomized and clamped variant of the log-likelihood ratio test. This result is an analogue of the classical Neyman–Pearson lemma in the setting of private hypothesis testing. The characterization applies more generally to hypothesis tests satisfying essentially any notion of algorithmic stability, which is known to imply strong generalization bounds in adaptive data analysis, and thus our results have applications even when privacy is not a primary concern.

Full paper can be found at: https://arxiv.org/abs/1811.11148. Joint work with Clément Canonne, Gautam Kamath, Adam Smith and Jonathan Ullman.

Week 2 (October 7)

Jingbo Shang (UCSD)

Named Entity Recognition from a Data-Driven Perspective

Named entity recognition (NER) is one of the core tasks in natural language processing (NLP), and has numerous applications in various domains. Recent advances in neural NER models (e.g., LSTM-CRF) have freed human effort from handcrafting features. In this talk, we will briefly revisit these models and discuss how can we improve upon them from a data-driven perspective. The key philosophy of "data-driven" here is to enhance NER performance without introducing any additional human annotations. We will attack this problem from different angles, including pre-training & co-training language models, introducing dictionaries for distant supervision, detecting and re-weighing noise training data, and removing the dependency on tokenizer (especially for social media posts).

Week 3 (October 14)

Internship roundup

PhD students will discuss their summer internships:

Paarth Neekhara will also present his EMNLP paper: Adversarial Reprogramming of Text Classification Neural Networks

Week 4 (October 21)

EMNLP Presentations (Part 2)

PhD students will present their EMNLP (Empirical Methods in Natural Language Processing) papers:

Week 5 (October 28)

EMNLP Presentations (Part 3)

PhD students will present their EMNLP papers:

Week 6 (November 4)

Michal Moshkovitz (UCSD)

Week 7 (November 11)

Veteran's Day

No seminar
Week 8 (November 18)

Cyrus Rashtchian (UCSD)