CSE 259: UCSD AI Seminar, Fall 2017
Term: Fall Qtr 2017
|1||October 2||Vitor Carvalho||Lead Research Scientist, Snap Research|
|2||October 9||Kai-Wei Chang||UCLA|
|3||October 16||Andrew Kahng||UCSD CSE & ECE|
|4||October 23||Oren Etzioni||CEO, Allen Institute for Artificial Intelligence (AI2)|
|5||October 30||Angela Yu||UCSD CogSci|
|6||November 6||Russell Impagliazzo||UCSD CSE|
|7||November 13||Shuai Tang||UCSD CogSci|
|8||November 20||Michael Yip||UCSD ECE|
|9||November 27||Haipeng Luo||USC|
|10||December 4||Chun-Nan Hsu||UCSD Bioinformatics|
Personalized Neural Conversation Models and other research projects at Snapchat
In this talk we will start by briefly overviewing some of the projects currently under consideration at Snap Research. Then we will focus on recent advances in Deep Learning that have sparked an interest in modeling language, particularly for personalized conversational agents that can retain contextual information during dialog exchanges. We explore and compare several of the recently proposed neural conversation models, and carry out an evaluation of the multiple factors that can affect predictive performance. Based on the tradeoffs of different models, we propose a new neural generative dialog model conditioned on speakers as well as context history that outperforms previous models on both retrieval and generative metrics.
Vitor Carvalho is a Lead Research Scientist at Snap Research. He is interested in applied research interfacing Machine Learning, Natural Language Processing, Data Mining and Search. He finished his PhD at Carnegie Mellon University working under William W. Cohen. He has worked at Qualcomm Research, Microsoft Bing, Ericsson and can be found @vitroc.
Structured Prediction: Practical Advancements and Applications in Natural Language Processing
Many machine learning problems involve making joint predictions over a set of mutually dependent output variables. The dependencies between output variables can be represented by a structure, such as a sequence, a tree, a clustering of nodes, or a graph. Structured prediction models have been proposed for problems of this type, and they have been shown to be successful in many application areas, such as natural language processing, computer vision, and bioinformatics. In this talk, I will describe a collection of results that improve several aspects of these approaches. Our results lead to efficient learning algorithms for structured prediction models, which, in turn, support reduction in problem size, improvements in training and evaluation speed. I will also discuss potential risks and challenges when using structured prediction models. Related information is on my homepage.
Bio: Kai-Wei Chang is an assistant professor in the Department of Computer Science at the University of California, Los Angeles. He has published broadly in machine learning and natural language processing. His research has mainly focused on designing machine learning methods for handling large and complex data. He has been involved in developing several machine learning libraries, including LIBLINEAR, Vowpal Wabbit, and Illinois-SL. He was an assistant professor at the University of Virginia in 2016-2017. He obtained his Ph.D. from the University of Illinois at Urbana-Champaign in 2015 and was a post-doctoral researcher at Microsoft Research in 2016. Kai-Wei was awarded the EMNLP Best Long Paper Award (2017), KDD Best Paper Award (2010), and the Yahoo! Key Scientific Challenges Award (2011). Additional information is available at http://kwchang.net
ML Problems Arising in Integrated-Circuit Design
As classic “Moore’s Law” geometric scaling slows, it has fallen upon electronic design automation (EDA) to deliver "design-based equivalent scaling" that helps to continue the Moore’s-Law scaling of semiconductor value. A powerful lever for this will be the use of machine learning (ML) techniques, both inside and “around” EDA tools. This talk will give a “lightning round” of open problem formulations for ML that arise in integrated-circuit design. Each of these examples has available data sources / datasets and “motivated customers” (e.g., at EDA companies, semiconductor product companies, and/or foundries). Relevant problem types and ML techniques span classification, active learning, clustering, reinforcement learning, etc. Some background:
Andrew B. Kahng is Professor of CSE and ECE at UC San Diego, where he holds the endowed chair in High-Performance Computing. He has served as visiting scientist at Cadence (1995- 1997) and as founder/CTO at Blaze DFM (2004-2006). He is the coauthor of 3 books and over 400 journal and conference papers, holds 33 issued U.S. patents, and is a fellow of ACM and IEEE. He has served as general chair of DAC, ISQED, ISPD and other conferences. He served as international chair/co- chair of the Design technology working group, and of the System Integration focus team, for the International Technology Roadmap for Semiconductors (ITRS) from 2000-2016. His research interests include IC physical design and performance analysis, the IC design-manufacturing interface, combinatorial algorithms and optimization, and the roadmapping of systems and technology.
The Future of AI
Given the rapid advances in AI recently, what will the field look like in 5 to 10 years? What are open problems that deep learning and reinforcement learning are not able to solve? And how will AI advances affect our society. My talk will address these questions in a non-technical manner.
Dr. Oren Etzioni is Chief Executive Officer of the Allen Institute for Artificial Intelligence. He has been a Professor at the University of Washington's Computer Science department since 1991, receiving several awards including Seattle's Geek of the Year (2013), the Robert Engelmore Memorial Award (2007), the IJCAI Distinguished Paper Award (2005), AAAI Fellow (2003), and a National Young Investigator Award (1993). He has been the founder or co-founder of several companies including Farecast (sold to Microsoft in 2008) and Decide (sold to eBay in 2013). He has written commentary on AI for the New York Times, Nature, Wired, and the MIT Technology Review. He helped to pioneer meta-search (1994), online comparison shopping (1996), machine reading (2006), and Open information Extraction (2007). He has authored of over 100 technical papers that have garnered over 1,800 highly influential citations on Semantic Scholar. He received his Ph.D. from Carnegie Mellon University in 1991, and his B.A. from Harvard in 1986.
Computational Modeling of Human Face Processing
Humans excel in certain kinds of high-dimensional data processing, such as the processing of face images. Even very young children readily solve challenging computational problems such as individual recognition, emotion classification, and social trait assessment (e.g. attractiveness and trustworthiness). In this talk, I will discuss our recent work using a statistical framework (the Active Appearance Model, AAM) to model human face processing. It has enabled us to to examine what facial features drive human perception of social traits, how the underlying representation and computation reconfigure depending on the behavioral context, how the process depends on the past experiences of the observer, and how the computations may be carried out in the brain. In particular, we demonstrate how cognitive functions such as memory and attention play an important role in human face processing. This work may be of interest to the machine learning/artificial intelligence community, both for artificial systems that interact socially with humans, and for those that may benefit from incorporating the computational principles supporting some of the most developed aspects of human visual perception expertise.
Prof. Angela Yu is an Associate Professor in the Department of Cognitive Science at UCSD.
Learning models : connections between boosting, hard-core distributions, dense models, GAN, and regularit
A theme that cuts across many domains of computer science and mathematics is to find simple representations of complex mathematical objects such as graphs, functions, or distributions on data. These representations need to capture how the object interacts with a class of tests, and to approximately determine the outcome of these tests. For example, a scientist is trying to find a mathematical model explaining observed data about some phenomenon, such as kinds of butterflies in a forest. A minimal criterion for success is that the model should accurately predict the results of future observations. When is this possible? This general situation arises in many contexts in computer science and mathematics. In machine learning, the object might be a distribution on data points, high dimensional real vectors, and the tests might be half-spaces. The goal would be to learn a simple representation of the data that determines the probability of any half-space or possibly intersections of half spaces. In computational complexity, the object might be a Boolean function or distribution on strings, and the tests are functions of low circuit complexity. In graph theory, the object is a large graph, and the tests are the cuts In the graph; the representation should determine approximately the size of any cut. In additive combinatorics, the object might be a function or distribution over an Abelian group, and the tests might be correlations with linear functions or polynomials.
In particular, in this talk, we'll focus on how certain families of boosting algorithms can be used to construct GAN-like algorithms. This construction is based on a reduction from dense model theorems from additive number theory to the hardcore distribution lemma from computational complexity. We call these algorithms GAB, generation from adversarial boosting. These algorithms will have strong theoretical guarantees, for when they will succeed in finding a similar distribution to the samples, and a guarantee about not over-fitting the distribution generated to the specific samples. Arora, Ge, Liang, Ma and Zhang observed that standard GAN's can converge to distributions that are much lower entropy than the sampled distribution, and Arora and Zhang have performed empirical studies showing that this indeed occurs in practice, at least some of the time. In contrast, we give a version of GAB that is guaranteed to produce a distribution with entropy that is not only within a fraction of a bit of the sampled distribution, but within the same fraction from any distribution that is indistinguishable from it using small threshold circuits. If the GAB algorithm fails to produce a high entropy distribution indistinguishable from the sampled distribution, it instead produces a witness that no such distribution exists.
Prof. Russell Impagliazzo is an Professor of Computer Science and Engineering at UCSD.
Learning Distributed Representations of Sentences
Language is one major way for humans to communicate with each other and understand the words. While words and sentences are easier for us to understand, numbers and values are much easier for machines to manipulate and operate on. This raises the question of how best to create a numeric vectorized representation of words and sentences. A vectorized representation of language which encodes the semantics will help the machines to better interact with people. Recent trends in deep learning have shown that we can achieve this by learning from data, either labeled or unlabeled. Numerous studies have shown that context in which every word or sentence is placed contributes to its own semantics, and also plays an important role in human language. In this talk, I will walk you through recent research in deep learning on learning distributed sentence representations from exploiting the context information in an unsupervised/self-supervised fashion.
Shuai Tang is a 3rd year PhD student in Cognitive Science department at UC San Diego. He is interested in learning vector representations of language. More information is on shuaitang.github.io
Learning in Robot Manipulation: From Model-free Control to Deep Neural Networks
Robot manipulation has traditionally been a problem of model-based control and motion planning in structured environments. This has made them very well suited for well-defined environments such as a manufacturing floor, where humans are minimally present. However, as further complex and partially-observable environments and complex robots are proposed (such as flexible trunk-like manipulators), outcomes of robot actions become more and more uncertain, and model-based methods tend to fail or produce unexpected results. Erratic behavior makes robots dangerous in human environments and thus new approaches must be taken. In this talk, I will discuss our research in learning for robot manipulation, the use of data for learning to control robots online and perform fast motion planning and adaptation to changing environments. We will discuss the application to robotic surgery and the collaboration of an autonomous robot with surgeons to perform a shared task.
Bio: Michael Yip is an Assistant Professor of Electrical and Computer Engineering at UCSD and directs the Advanced Robotics and Controls Lab (ucsdarclab.com) . His group has developed automated endoscopic and catheter robots for treating heart and lung disease, designing artificial intelligence for robot-human collaboration in surgery, and aiding surgeon teams with augmented reality for minimally invasive surgery. His work in learning-based robot automation for surgery has won several best paper awards, including the best paper award for IEEE Robotics and Automation Letters for 2016. Before UCSD, Dr. Yip was a research associate with Disney Research involved in building next generation animatronics. He received a B.Sc. in Mechatronics Engineering from the University of Waterloo, an M.S. in Electrical Engineering from the University of British Columbia, and a Ph.D. in Bioengineering from Stanford University. His research focuses on reinforcement learning and deep learning for robot manipulators control and planning.
Towards Practical Contextual Bandits
The contextual bandits problem is about learning to make decisions with high rewards while interacting with the environment and receiving partial feedback. It is especially useful in modeling applications such as personalized recommendation system, and has indeed been deployed in practice recently. However, current algorithms for contextual bandits are still very limited and cannot deal with practical scenarios while having strong theoretical guarantees. In this talk I will discuss two recent works on making contextual bandits more practical. The first one focuses on combining different bandit algorithms in a blackbox way, so that the final algorithm simultaneously enjoys the advantages of all the algorithms it combines. The second one discusses efficient and optimal contextual bandits algorithms in non-stationary environments, improving the state-of-the-art that can only deal with i.i.d. data.
Haipeng Luo is an assistant professor in the Computer Science Department at the University of Southern California since 2017 Fall. He completed his PhD at Princeton under the supervision of Robert Schapire, and spent one year at Microsoft Research NYC as a postdoctoral researcher afterwards. His research interest is mostly in theoretical machine learning, with a focus on online learning, bandit problems, boosting, optimization, and game theory. His work won the best paper awards of ICML'15 and NIPS'15.