CSE 259: UCSD AI Seminar, Spring 2021
Term: Spring Qtr 2021
|1||March 29||Jean Honorio||Assistant Professor, Purdue University|
|2||April 5||Shashank Srivastava||Assistant Professor, Univerity of North Carolina, Chapel Hill|
|3||April 12||Laura Dietz||Assistant Professor, University of New Hampshire|
|4||April 19||Noah Smith||Professor, University of Washington|
|5||April 26||Claire Cardie||Professor, Cornell University|
|6||May 3||Rada Mihalcea||Professor, University of Michigan|
|7||May 10||Ed Hovy||Professor, Carnegie Mellon University|
|8||May 17||Ellie Pavlick||Assistant Professor, Brown University|
|9||May 24||Ani Nenkova||Associate Professor, University of Pennsylvania and Adobe Research|
|10||May 31||Memorial Day||Memorial Day|
Interpretability Analysis for Named Entity Recognition
In this talk I will present a set of experiments designed to help us understand what neural named entity recognition systems learn and why they make the predictions we see. Specifically, we seek to understand if systems learn name strings (Ani, Julia, Kathy) or if they are able to identify textual contexts that constrain the semantics class of whatever word appears in that context ("My name is __ "). I will present evidence that the performance of neural methods is largely driven by their ability to recognize word tokens as belonging to certain semantic classes. In a study with people, we find that in many cases people are indeed able to identify constraining contexts and figure out the class only from the context in the sentence. People's recognition of constraining contexts aligns better with predictions from biLSTM-CRF models than BERT models. I will present compelling evidence that current models do not integrate contextual clues effectively. These results indicate that NER is a challenging yet practical domain for testing machine text comprehension abilities.
Ani Nenkova is a Principal Scientist at Adobe Research and associate professor of computer and information science at the University of Pennsylvania (on leave). Her work on summarization, discourse, multi-modal emotion prediction and information extraction in the biomedical domain has been recognized with best paper awards at SIGDIAL 2010, EMNLP-CoNLL 2012, AVEC 2012 and AMIA 2021. Ani was program chair for NAACL in 2016 and currently serves as editor-in-chief for the Transactions of the Association for Computational Linguistics (TACL).
You can lead a horse to water...: Representing vs. Using Features in Neural NLP
A wave of recent work has sought to understand how pretrained language models work. Such analyses have resulted in two seemingly contradictory sets of results. On one hand, work based on "probing classifiers" generally suggests that SOTA language models contain rich information about linguistic structure (e.g., parts of speech, syntax, semantic roles). On the other hand, work which measures performance on linguistic "challenge sets" shows that models consistently fail to use this information when making predictions. In this talk, I will present a series of results that attempt to bridge this gap. Our recent experiments suggest that the disconnect is not due to catastrophic forgetting nor is it (entirely) explained by insufficient training data. Rather, it is best explained in terms of how "accessible" features are to the model following pretraining, where "accessibility" can be quantified using an information-theoretic interpretation of probing classifiers.
Ellie Pavlick is an Assistant Professor of Computer Science at Brown University where she leads the Language Understanding and Representation (LUNAR) Lab. She received her PhD from the one-and-only University of Pennsylvania. Her current work focuses on building more cognitively-plausible models of natural language semantics, focusing on grounded language learning and on sample efficiency and generalization of neural language models.
NLP: The Past and 3.5 Futures
Natural Language Processing (NLP) of text and speech (also called Computational Linguistics) is just over 60 years old and is continuously evolving—not only its technical subject matter, but also the basic questions being asked and the style and methodology used to answer them. Unification followed finite-state technology in the 1980s, moving in the 1990s to statistical processing and machine learning as the dominant paradigm; since about 2015 deep neural methods have taken over. Large-scale processing over diverse data has brought general-level performance to a list of applications that includes speech recognition, information retrieval from the web, machine translation, information extraction, question answering, text summarization, sentiment detection, and dialogue processing. In all this work three main complementary types of research and foci of interest have emerged, each with its own goals, evaluation paradigm, and methodology: (1) the resource creators focus on the nature of language and representations required for language processing; (2) the learning researchers focus on algorithms to effect the transformation of representation required in NLP; and (3) the large-scale system builders produce engines that win the NLP competitions and build companies. Though the latter two have fairly well-established research methodologies, the first doesn’t, and consequently suffers in recognition and funding. However, I believe, the main theoretical advances of NLP will occur here. In the talk, I describe the three trends of NLP research and pose some general questions, including: What is NLP, as a field? What is the nature of the work performed in each stream? What, if any, are the theoretical contributions of each stream? What is the likely future of each stream, and what kind of work should one choose to do if one is a grad student today?
Eduard Hovy is a research professor at the Language Technologies Institute in the School of Computer Science at Carnegie Mellon University. Starting in 2020 he served a term as Program Manager in DARPA’s Information Innovation Office (I2O), where he managed programs in Natural Language Technology and Data Analytics totaling over $25M per year. Dr. Hovy holds adjunct professorships in CMU’s Machine Learning Department and at USC (Los Angeles). Dr. Hovy completed a Ph.D. in Computer Science (Artificial Intelligence) at Yale University in 1987 and was awarded honorary doctorates from the National Distance Education University (UNED) in Madrid in 2013 and the University of Antwerp in 2015. He is one of the initial 17 Fellows of the Association for Computational Linguistics (ACL) and is also a Fellow of the Association for the Advancement of Artificial Intelligence (AAAI). Dr. Hovy’s research focuses on computational semantics of language, and addresses various areas in Natural Language Processing and Data Analytics, including in-depth machine reading of text, information extraction, automated text summarization, question answering, the semi-automated construction of large lexicons and ontologies, and machine translation. In early 2021 his Google h-index was 92, with over 46,000 citations. Dr. Hovy is the author or co-editor of eight books and over 450 technical articles and is a popular invited speaker. From 2003 to 2015 he was co-Director of Research for the Department of Homeland Security’s Center of Excellence for Command, Control, and Interoperability Data Analytics, a distributed cooperation of 17 universities. In 2001 Dr. Hovy served as President of the international Association of Computational Linguistics (ACL), in 2001–03 as President of the International Association of Machine Translation (IAMT), and in 2010–11 as President of the Digital Government Society (DGS). Dr. Hovy regularly co-teaches Ph.D.-level courses and has served on Advisory and Review Boards for both research institutes and funding organizations in Germany, Italy, Netherlands, Ireland, Singapore, and the USA.
Natural Language Processing During and For a Pandemic
Language plays a central role in the process of giving or receiving healthcare -- COVID-19 being no exception. We see language being used across the board, all the way from patient records and publications reporting new medical findings, to doctor-patient conversations or counseling interactions targeting behavior change. In this talk, I will share some of the research taking place in the Language and Information Technologies lab at the University of Michigan on natural language processing for healthcare, including work motivated by the recent pandemic on understanding the impact of COVID-19 on mental health and dialog agents for anxiety relief.
Rada Mihalcea is the Janice M. Jenkins Collegiate Professor of Computer Science and Engineering at the University of Michigan and the Director of the Michigan Artificial Intelligence Lab. Her research interests are in computational linguistics, with a focus on lexical semantics, multilingual natural language processing, and computational social sciences. She serves or has served on the editorial boards of the Journals of Computational Linguistics, Language Resources and Evaluations, Natural Language Engineering, Journal of Artificial Intelligence Research, IEEE Transactions on Affective Computing, and Transactions of the Association for Computational Linguistics. She was a program co-chair for EMNLP 2009 and ACL 2011, and a general chair for NAACL 2015 and *SEM 2019. She currently serves as ACL President. She is the recipient of a Presidential Early Career Award for Scientists and Engineers awarded by President Obama (2009), an ACM Fellow (2019) and a AAAI Fellow (2021). In 2013, she was made an honorary citizen of her hometown of Cluj-Napoca, Romania.
Re-thinking Information Extraction in the Age of Neural Networks
In this talk, I’ll examine the state of the Natural Language Processing subfield of information extraction since its inception almost 30 years ago and identify document-level event extraction as one important task that should be reconsidered in light of recent advances in neural networks. Next, I'll present our new work that is a step in this direction --- first re-framing event extraction as an end-to-end question answering task, and then considering event extraction as the natural language generation problem of directly generating the relevant, structured event information from the original, unstructured text of an input document in an end-to-end fashion.
Claire Cardie is the John C. Ford Professor of Engineering in the Departments of Computer Science and Information Science at Cornell Uniiversity. She has worked since the early 1990’s on application of machine learning methods to problems in Natural Language Processing – on topics ranging from information extraction, noun phrase coreference resolution, text summarization and question answering to the automatic analysis of opinions, argumentation, and deception in text. She has been Program Chair for ACL/COLING, EMNLP and CoNLL, and General Chair for ACL in 2018. Cardie was named a Fellow of the ACL in 2015 and a Fellow of the Association for Computing Machinery (ACM) in 2019. At Cornell, she led the development of the university’s academic programs in Information Science, was the founding Chair of its Information Science Department, and is currently serving as the inaugural Associate Dean for Education in Cornell's College of Computing and Information Science.
Language Models: Challenges and Progress
Probabilistic language models are once again foundational to many advances in natural language processing research, bringing the exciting opportunity to harness raw text to build language technologies. With the emergence of deep architectures and protocols for finetuning a pretrained language model, many NLP solutions are being cast as simple variations on language modeling. This talk is about challenges in language model-based NLP and some of our work toward solutions. First, we'll consider evaluation of generated language. I'll present some alarming findings about humans and models and make some recommendations. Second, I'll turn to an ubiquitous design limitation in language modeling -- the vocabulary -- and present a linguistically principled, sample-efficient solution that enables modifying the vocabulary during finetuning and/or deployment. Finally, I'll delve into today's most popular language modeling architecture, the transformer, and show how its attention layers' quadratic runtime can be made linear without affecting accuracy. Taken together, we hope these advances will broaden the population of people who can effectively use and contribute back to NLP.
Noah Smith is a Professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington, as well as a Senior Research Manager at the Allen Institute for Artificial Intelligence. Previously, he was an Associate Professor of Language Technologies and Machine Learning in the School of Computer Science at Carnegie Mellon University. He received his Ph.D. in Computer Science from Johns Hopkins University in 2006 and his B.S. in Computer Science and B.A. in Linguistics from the University of Maryland in 2001. His research interests include statistical natural language processing, machine learning, and applications of natural language processing, especially to the social sciences. His book, Linguistic Structure Prediction, covers many of these topics. He has served on the editorial boards of the journals Computational Linguistics (2009–2011), Journal of Artificial Intelligence Research (2011–present), and Transactions of the Association for Computational Linguistics (2012–present), as the secretary-treasurer of SIGDAT (2012–2015 and 2018–present), and as program co-chair of ACL 2016. Alumni of his research group, Noah's ARK, are international leaders in NLP in academia and industry; in 2017 UW's Sounding Board team won the inaugural Amazon Alexa Prize. He was named an ACL Fellow in 2020, "for significant contributions to linguistic structure prediction, computational social sciences, and improving NLP research methodology." Smith's work has been recognized with a UW Innovation award (2016–2018), a Finmeccanica career development chair at CMU (2011–2014), an NSF CAREER award (2011–2016), a Hertz Foundation graduate fellowship (2001–2006), numerous best paper nominations and awards, and coverage by NPR, BBC, CBC, New York Times, Washington Post, and Time. More Info: http://homes.cs.washington.edu/~nasmith
Retrieve-and-generate: How to Automatically Create Relevant Articles
A lot of progress has been made towards answering specific formalized information needs, such as questions or detailed search queries. However users who familiarize themselves in a new domain would like to read overviews that explain "everything that one needs to know" about a topic, instead of having to ask questions one by one. So far, such users either find an overview article on the web or a wiki, or they are left to piece together this overview on their own. The vision of complex answer retrieval is to develop algorithms that can produce comprehensive overviews for given topics such as "Zika fever", "Green Sea Turtle", or "Reducing air pollution". The success of strong neural models for language generation suggest the feasibility of this idea. However, several tasks such as subtopic detection and story generation need to be addressed before retrieve-and-generate systems will provide information-rich, relevant, and useful overviews. This talk gives an overview of the research advances resulting from TREC Complex Answer Retrieval and the years since. More information about the TREC CAR and the datasets are available at http://trec-car.cs.unh.edu
Laura Dietz is an Assistant Professor at the University of New Hampshire, where she leads the lab for text retrieval, extraction, machine learning and analytics (TREMA). She organizes a tutorial/workshop series on Utilizing Knowledge Graphs in Text-centric Retrieval (KG4IR) and coordinates the TREC Complex Answer Retrieval Track. She received an NSF CAREER Award for utilizing fine-grained knowledge annotations in text understanding and retrieval. Previously, she was a research scientist at the Data and Web Science Group at Mannheim University and the Center for Intelligent Information Retrieval (CIIR) at UMass Amherst. She obtained her doctoral degree with a thesis on topic models for networked data from Max Planck Institute for Informatics. More Info: https://www.cs.unh.edu/~dietz
Few-shot Learning with Interactive Language
Today machine learning is largely about function approximation from large volumes of labeled data. However, humans learn through multiple mechanisms in addition to inductive inference. In particular, we can efficiently learn and communicate new knowledge about the world through natural language and our educational systems rely on learning processes that are deeply intertwined with language, e.g., reading books, listening to lectures, engaging in student-teacher dialogs. In this talk, we will explore some recent work on building automated learning systems that can learn new tasks through natural language interactions with their users in scenarios with limited labeled data. We will cover multiple scenarios to demonstrate this idea: learning web-based tasks from descriptions and demonstrations; using language to communicate domain knowledge for reasoning tasks; and leveraging natural language patterns in conjunction with large language models for few-shot learning.
Shashank Srivastava is an assistant professor in the Computer Science department at the University of North Carolina (UNC) Chapel Hill. Shashank received his PhD from the Machine Learning department at CMU in 2018, and was an AI Resident at Microsoft Research in 2018-19. Shashank's research interests lie in conversational AI, interactive machine learning and grounded language understanding. Shashank has an undergraduate degree in Computer Science from IIT Kanpur, and a Master’s degree in Language Technologies from CMU. He received the Yahoo InMind Fellowship for 2016-17. His research has been covered by popular media outlets including GeekWire and New Scientist.
Fair Sparse Regression with Clustering: An Invex Relaxation for a Combinatorial Problem
We study the problem of fair sparse regression on a biased dataset where bias depends upon a hidden binary attribute. The presence of a hidden attribute adds an extra layer of complexity to the problem by combining sparse regression and clustering with unknown binary labels. The corresponding optimization problem is combinatorial but we propose a continuous relaxation, resulting in an invex optimization problem. To the best of our knowledge, this is the first invex relaxation for a combinatorial problem. We show that our method recovers the correct support of the regression parameter vector, as well as the exact value of the hidden attribute for each sample. The above theoretical guarantees hold as long as the number of samples is logarithmic in terms of the dimension of the regression parameter vector. The result above serves as a gentle introduction to a unifying framework, which uses the power of continuous relaxations (beyond convexity), Karush-Kuhn-Tucker conditions, primal-dual certificates and concentration inequalities. This framework has allowed us to produce novel algorithms for several NP-hard combinatorial problems, such as learning Bayesian networks, graphical games, inference in structured prediction, and community detection.
Jean Honorio is an Assistant Professor in the Computer Science Department at Purdue University, as well as in the Statistics Department (by courtesy). Prior to joining Purdue, Jean was a postdoctoral associate at MIT, working with Tommi Jaakkola. His Erdős number is 3. His work has been partially funded by NSF. He is an editorial board reviewer of JMLR, and has served as senior PC member of IJCAI and AAAI, PC member of NeurIPS, ICML, AISTATS among other conferences and journals.