DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
UNIVERSITY OF CALIFORNIA, SAN DIEGO

CSE 259: AI seminar

 Fall 2008


Important: Due to scheduling conflicts, the AI seminar will happen at 2pm on Mondays in the Fall quarter, in room 1202.  

CSE 259 is a series of talks devoted to current research in artificial intelligence, machine learning, and related areas.  All talks are free and open to the general public.  Everyone interested in the AI seminar should join the UCSD AI mailing list.  

date speaker (* means not confirmed) title or topic
9/29 Ran El-Yaniv

Dr. El-Yaniv is an associate professor at the Computer Science Department at the Technion - Israel Institute of Technology. 
On tighter performance guarantees for classification via coverage compromises

In many practical learning tasks for which only small training samples are available, the best provable learning curves may fail to provide meaningful error bounds.  This talk will present some recent attempts to supply tighter (and meaningful) error bounds only for a fraction of the instances that fall in ``easy'' parts of the domain.


In our setting the objective is to both generate a (more accurate) classifier and a corresponding error bound that only applies to a part of the domain.  While the test error of a classifier can, in principle, decrease monotonically with the coverage, the error bound cannot in general behave in this way.  As the coverage decreases, the labeled training set decreases accordingly and the statistical confidence interval of any error bound increases.

We will focus on a new learning problem that we call "selective-PAC learning."  This problem elicits the learning of classifiers adhering to a controlled tradeoff between the coverage of their performance guarantee and its tightness. We will discuss selective-PAC preliminary solutions for both inductive and transductive settings, and show empirical results providing a proof of concept for the proposed approach.  This is an ongoing joint work with Yair Wiener.
10/6 Yoram Singer

Dr. Singer is a researcher at Google, and formerly a faculty member at the School of Computer Science and Engineering of The Hebrew University.
Efficient algorithms for learning sparse representations in high dimensions

We describe efficient algorithms for learning sparse representations.  The first algorithm we describe relies on efficient procedure for projecting a vector onto the L1-ball. We present two projection methods. The first method performs exact projection in O(n) time, where n is the dimension of the space. The second method works on vectors k of whose elements are perturbed outside the L1-ball, projecting in O(k log(n)) time. This setting is especially useful for online learning in sparse feature spaces, such as text categorization applications and web-based applications.  The second family of algorithms we describe is tailored for empirical loss minimization with non-differentiable regularization. The algorithm first performs an unconstrained gradient descent step.  Then, it casts and solves an instantaneous optimization problem that trades off between minimizing the regularization term while keeping close proximity to the result of the first phase. This gives a simple yet effective algorithm that can be used for batch penalized risk minimization and online learning.  Furthermore, the two phase approach enables sparse solutions when used in conjunction with regularization functions that promote sparsity, such as L1.  We derive concrete and very simple algorithms for various regularization functions. We also show how to construct efficient algorithms for mixed-norms.  We demonstrate the potential of the algorithms in a series of experiments with various datasets.

Joint work with John Duchi, Shai Shalev-Shwartz, Tushar Chandra, and Tal Shaked.
10/13
Deepak Agarwal

Dr. Agarwal is currently a senior research scientist at Yahoo! Research. Prior to joining Yahoo!, he was a member of the statistics department at AT&T Research. His current research interests include large scale regression for massive, sparse and noisy data via "feature aggregation", anomaly detection in high dimensional spaces, multi-armed bandit problems for learning taxonomies and statistical methods for social network analysis. He won the best applications paper award at Siam Data Mining 2004 for his Bayesian modeling work on large sparse social networks via stochastic blockmodels, and more recently the best research paper award at KDD 2007 for his work that propose a general class of models for large sparse dyadic data.
Predictive discrete latent models for large incomplete matrix data

Prediction with incomplete matrix data is an important task in applications like collaborative filtering, web advertising, e-commerce etc. We propose a novel statistical model that performs such prediction tasks for large scale problems in the presence of covariate information. We illustrate our approach through large scale simulation experiments and analysis of data obtained from movie recommendation and web advertising applications. The key feature of our approach is the simultaneous estimation of covariate effects and local interaction structure through a discrete latent factor model.  The latent factors provide a predictive model that is both accurate and interpretable. To further combat sparsity that is often present in real data, we enhance the discrete latent factors by coupling it with a factorization model. 
10/20
Michael Mahoney

Dr. Mahoney is currently a research scientist in the math department at Stanford University.  He is working on geometric network analysis; developing approximate computation and regularization methods for large informatics graphs; and applications to community detection, clustering, and information dynamics in large social and information networks.  He has been a faculty member at Yale University and a researcher at Yahoo.

Community structure in large social and information networks

The concept of a community is central to social network analysis, and thus a large body of work has been devoted to identifying community structure. For example, a community may be thought of as a set of web pages on related topics, a set of people who share common interests, or more generally as a set of nodes in a network more similar amongst themselves than with the remainder of the network.  Motivated by difficulties we experienced at actually finding meaningful communities in large real-world networks, we have performed a large scale analysis of a wide range of social and information networks.  Our main methodology uses local spectral methods and involves computing isoperimetric properties of the networks at various size scales -- a novel application of ideas from scientific computation to internet data analysis. Our empirical results suggest a significantly more refined picture of community structure than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small size scales, and at larger size scales, the best possible communities gradually ``blend in'' with the rest of the network and thus become less ``community-like.'' This behavior is not explained, even at a qualitative level, by any of the commonly-used network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are well-embeddable in a low-dimensional structure, and from small social networks that have served as testbeds of community detection algorithms. Possible mechanisms for reproducing our empirical observations will be discussed, as will implications of these findings for clustering, classification, and more general data analysis in modern large social and information networks.
10/27 Chris Kanan

Chris Kanan is a Ph.D. student in CSE at UCSD.
SUN: Top-down visual saliency using natural statistics

When people try to find particular objects in natural scenes they make extensive use of top-down cognitive knowledge.  Although many forms of top-down knowledge have been incorporated into saliency map models of visual search, the role of object appearance has been infrequently investigated. Here we present an appearance based saliency model derived in a Bayesian framework. We compare our model's ability to predict where people will look in images when they are told to search for a particular object with bottom-up saliency algorithms as well as another top-down algorithm known as the Contextual Guidance model.  Although both top-down approaches use very different types of information, they achieve similar performance, and both substantially exceed purely bottom-up models that do not take the current task into account. Our experiments reveal that the simple model of object appearance we present can predict human fixations quite well, even making the same mistakes as people.
11/3 Eamonn Keogh

Dr. Keogh is an associate professor of computer science at UC Riverside.
Shaplets, motifs and discords: Primitives for mining time series and image archives

The past decade has seen tremendous interest in mining of time series and shape datasets, as such data can be found in domains as diverse as entertainment, finance, medicine and astronomy.  However, much of this work has focused on toy problems, with a few thousand objects. In recent years, our research group has made an effort to address the problems of classification, clustering, query-by-content, motif discovery, and outlier detection on truly massive datasets, with 100 million+ objects.

In this talk we will summarize our research findings over the last two years, and show that a small set of primitives, shaplets, motifs and discords, allow us to solve essentially all problems in shape/time series data mining with efficient, effective and interpretable results.  We will demonstrate the utility of our ideas, with case studies in anthropology, astronomy, entomology, historical manuscript annotation and medicine.
Wednesday 11/5, 2pm,
in CSE 4140.

Note special day and room!
Eric Yu-En Lu

Dr. Yu is a postdoctoral fellow at Cambridge University.
Low dimensional random projections and similarity search

Random projection (RP) is a common technique for dimensionality reduction under $L_2$ norm for which many significant space embedding results have been demonstrated. However, many similarity search applications often require very low dimension embeddings in order to reduce overhead and boost performance. Inspired by the use of symmetric probability distributions in previous work, we propose a novel RP algorithm, Beta Random Projection, and give its probabilistic analyses based on Beta and Gaussian approximations. We evaluate the algorithm in terms of standard similarity metrics with other RP algorithms as well as the singular value decomposition (SVD). Our experimental results show that BRP preserves both similarity metrics well and, under various dataset types including random point sets, text (TREC5) and images, provides sharper and consistent performance.
11/10 Angela Yu

Dr. Yu recently joined the UCSD Cognitive Science Department as an Assistant Professor.  She received her PhD from Gatsby Computational Neuroscience Unit at University College London, and did her postdoctoral training at Princeton University.  In her research, she uses computational and mathematical models, and complementary experimental work, to understand the neural machinery underlying cognitive processes, such as attention, vision, learning, and decision-making.
Optimal decision making under uncertainty and temporal pressure

Most models of decision-making in neuroscience assume an infinite horizon, which yields an optimal solution that integrates evidence up to a fixed decision threshold; however, under most experimental as well as naturalistic behavioral settings, the decision has to be made before some finite deadline, which is often experienced as a stochastic quantity, either due to variable external constraints or internal timing uncertainty. In this work, we formulate this problem as sequential hypothesis testing under a stochastic horizon. We use dynamic programming tools to show that, for a large class of deadline distributions, the Bayes-optimal solution requires integrating evidence up to a threshold that declines monotonically over time. We use numerical simulations to illustrate the optimal policy in the special cases of a fixed deadline and one that is drawn from a gamma distribution.
11/17 Fan Chung

Dr. Chung is a professor of mathematics and of computer science at UCSD.
Four graph partitioning algorithms

We will discuss four partitioning algorithms using eigenvectors, random walks, PageRank and their variations.  In particular, we will examine local partitioning algorithms, which find a cut near a specified starting vertex, with a running time that depends on the size of the small side of the cut, rather than on the size of the input graph (which can be prohibitively large).  Three of the four partitioning algorithms are local algorithms and are particularly appropriate for applications arising in connection with web graphs and Internet economics.

11/24 Fei Sha

Dr. Sha is an assistant professor of computer science at the University of Southern California.
DiscLDA: Discriminative learning for dimensionality reduction and classification

Abstract:  Probabilistic topic models (and their extensions) have become popular as models of latent structures in collections of text documents or images. These models are usually treated as generative models and trained using maximum likelihood estimation, an approach which may be suboptimal in the context of an overall classification problem. In this paper, we describe DiscLDA, a discriminative learning framework for such models as Latent Dirichlet Allocation (LDA) in the setting of dimensionality reduction with supervised side information. In DiscLDA, a class-dependent linear transformation is introduced on the topic mixture proportions. This parameter is estimated by maximizing the conditional likelihood using Monte Carlo EM. By using the transformed topic mixture proportions as a new representation of documents, we obtain a supervised dimensionality reduction algorithm that uncovers the latent structure in a document collection while preserving predictive power for the task of classification. We compare the predictive power of the latent structure of DiscLDA with unsupervised LDA on the 20 Newsgroup document classification task.

Joint work with Simon Lacoste-Julien (UC Berkeley) and Michael I. Jordan (UC Berkeley)
12/1 Shlomo Dubnov

Dr. Dubnov is an associate professor of music at UCSD.
The importance of now: Aspects of information dynamics and music perception

Interactive artistic applications such as machine improvisation and participatory performance require dynamic association of data over time and monitoring of interest, excitement or surprisal. Motivated by cognitive models of music expectancy, several measures have been proposed for analysis of music structure in terms of information dynamics. In the talk I will survey some of the recent works applying information rate, predictive information and so on to MIDI and audio data, and relate it to a new work on musical information geometry.

Students should register for the AI seminar for one unit of credit, every quarter that they attend. 


Most recently updated on November 17, 2008 by Charles Elkan, elkan@cs.ucsd.edu