| date | speaker (* means not confirmed) | title or
topic |
| 9/29 | Ran El-Yaniv Dr. El-Yaniv is an associate professor at the Computer Science Department at the Technion - Israel Institute of Technology. |
On tighter performance guarantees
for classification via coverage compromises In many practical learning tasks for which only small training samples are available, the best provable learning curves may fail to provide meaningful error bounds. This talk will present some recent attempts to supply tighter (and meaningful) error bounds only for a fraction of the instances that fall in ``easy'' parts of the domain. In our setting the objective is to both generate a (more accurate) classifier and a corresponding error bound that only applies to a part of the domain. While the test error of a classifier can, in principle, decrease monotonically with the coverage, the error bound cannot in general behave in this way. As the coverage decreases, the labeled training set decreases accordingly and the statistical confidence interval of any error bound increases. We will focus on a new learning problem that we call "selective-PAC learning." This problem elicits the learning of classifiers adhering to a controlled tradeoff between the coverage of their performance guarantee and its tightness. We will discuss selective-PAC preliminary solutions for both inductive and transductive settings, and show empirical results providing a proof of concept for the proposed approach. This is an ongoing joint work with Yair Wiener. |
| 10/6 | Yoram Singer Dr. Singer is a researcher at Google, and formerly a faculty member at the School of Computer Science and Engineering of The Hebrew University. |
Efficient algorithms for learning
sparse representations in high dimensions We
describe efficient
algorithms for learning sparse representations. The first
algorithm we describe relies on efficient procedure for projecting a
vector onto the L1-ball. We present two projection methods. The first
method performs exact projection in O(n) time, where n is the dimension
of the space. The second method works on vectors k of whose elements
are perturbed outside the L1-ball, projecting in O(k log(n)) time. This
setting is especially useful for online learning in sparse feature
spaces, such as text categorization applications and web-based
applications. The second family of algorithms we describe is
tailored for empirical loss minimization with non-differentiable
regularization. The algorithm first performs an unconstrained gradient
descent step. Then, it casts and solves an instantaneous
optimization problem that trades off between minimizing the
regularization term while keeping close proximity to the result of the
first phase. This gives a simple yet effective algorithm that can be
used for batch penalized risk minimization and online learning.
Furthermore, the two phase approach enables sparse solutions
when
used in conjunction with regularization functions that promote
sparsity, such as L1. We derive concrete and very simple
algorithms for various regularization functions. We also show how to
construct efficient algorithms for mixed-norms. We
demonstrate
the potential of the algorithms in a series of experiments with various
datasets.
Joint work with John Duchi, Shai Shalev-Shwartz, Tushar Chandra, and Tal Shaked. |
| 10/13 |
Deepak
Agarwal Dr. Agarwal is currently a senior research scientist at Yahoo! Research. Prior to joining Yahoo!, he was a member of the statistics department at AT&T Research. His current research interests include large scale regression for massive, sparse and noisy data via "feature aggregation", anomaly detection in high dimensional spaces, multi-armed bandit problems for learning taxonomies and statistical methods for social network analysis. He won the best applications paper award at Siam Data Mining 2004 for his Bayesian modeling work on large sparse social networks via stochastic blockmodels, and more recently the best research paper award at KDD 2007 for his work that propose a general class of models for large sparse dyadic data. |
Predictive discrete latent models
for large incomplete matrix data Prediction
with
incomplete matrix data is an important task in applications like
collaborative filtering, web advertising, e-commerce etc. We propose a
novel statistical model that performs such prediction tasks for large
scale problems in the presence of covariate information. We illustrate
our approach through large scale simulation experiments and analysis of
data obtained from movie recommendation and web advertising
applications. The key feature of our approach is the simultaneous
estimation of covariate effects and local interaction structure through
a discrete latent factor model. The latent factors provide a
predictive model that is both accurate and interpretable. To further
combat sparsity that is often present in real data, we enhance the
discrete latent factors by coupling it with a factorization
model.
|
| 10/20 |
Michael
Mahoney Dr. Mahoney is currently a research scientist in the math department at Stanford University. He is working on geometric network analysis; developing approximate computation and regularization methods for large informatics graphs; and applications to community detection, clustering, and information dynamics in large social and information networks. He has been a faculty member at Yale University and a researcher at Yahoo. |
Community structure in large
social and information networks The
concept of a
community is central to social network analysis, and thus a large body
of work has been devoted to identifying community structure. For
example, a community may be thought of as a set of web pages on related
topics, a set of people who share common interests, or more generally
as a set of nodes in a network more similar amongst themselves than
with the remainder of the network. Motivated by difficulties
we
experienced at actually finding meaningful communities in large
real-world networks, we have performed a large scale analysis of a wide
range of social and information networks. Our main
methodology
uses local spectral methods and involves computing isoperimetric
properties of the networks at various size scales -- a novel
application of ideas from scientific computation to internet data
analysis. Our empirical results suggest a significantly more refined
picture of community structure than has been appreciated previously.
Our most striking finding is that in nearly every network dataset we
examined, we observe tight but almost trivial communities at very small
size scales, and at larger size scales, the best possible communities
gradually ``blend in'' with the rest of the network and thus become
less ``community-like.'' This behavior is not explained, even at a
qualitative level, by any of the commonly-used network generation
models. Moreover, this behavior is exactly the opposite of what one
would expect based on experience with and intuition from expander
graphs, from graphs that are well-embeddable in a low-dimensional
structure, and from small social networks that have served as testbeds
of community detection algorithms. Possible mechanisms for reproducing
our empirical observations will be discussed, as will implications of
these findings for clustering, classification, and more general data
analysis in modern large social and information networks.
|
| 10/27 | Chris Kanan Chris Kanan is a Ph.D. student in CSE at UCSD. |
SUN: Top-down visual saliency
using natural statistics When
people try to find particular objects in natural scenes they make
extensive use of top-down cognitive knowledge. Although many
forms of top-down knowledge have been incorporated into saliency map
models of visual search, the role of object appearance has been
infrequently investigated. Here we present an appearance based saliency
model derived in a Bayesian framework. We compare our model's ability
to predict where people will look in images when they are told to
search for a particular object with bottom-up saliency algorithms as
well as another top-down algorithm known as the Contextual Guidance
model. Although both top-down approaches use very different
types
of information, they achieve similar performance, and both
substantially exceed purely bottom-up models that do not take the
current task into account. Our experiments reveal that the simple model
of object appearance we present can predict human fixations quite well,
even making the same mistakes as people.
|
| 11/3 | Eamonn Keogh Dr. Keogh is an associate professor of computer science at UC Riverside. |
Shaplets, motifs and discords:
Primitives for mining time series and image archives The
past decade has seen
tremendous interest in mining of time series and shape datasets, as
such data can be found in domains as diverse as entertainment, finance,
medicine and astronomy. However, much of this work has
focused on
toy problems, with a few thousand objects. In recent years, our
research group has made an effort to address the problems of
classification, clustering, query-by-content, motif discovery, and
outlier detection on truly massive datasets, with 100 million+ objects.
In this talk we will summarize our research findings over the last two years, and show that a small set of primitives, shaplets, motifs and discords, allow us to solve essentially all problems in shape/time series data mining with efficient, effective and interpretable results. We will demonstrate the utility of our ideas, with case studies in anthropology, astronomy, entomology, historical manuscript annotation and medicine. |
| Wednesday
11/5, 2pm, in CSE 4140. Note special day and room! |
Eric
Yu-En Lu Dr. Yu is a postdoctoral fellow at Cambridge University. |
Low
dimensional random projections and similarity search Random
projection (RP)
is a common technique for dimensionality reduction under $L_2$ norm for
which many significant space embedding results have been demonstrated.
However, many similarity search applications often require very low
dimension embeddings in order to reduce overhead and boost performance.
Inspired by the use of symmetric probability distributions in previous
work, we propose a novel RP algorithm, Beta Random Projection, and give
its probabilistic analyses based on Beta and Gaussian approximations.
We evaluate the algorithm in terms of standard similarity metrics with
other RP algorithms as well as the singular value decomposition (SVD).
Our experimental results show that BRP preserves both similarity
metrics well and, under various dataset types including random point
sets, text (TREC5) and images, provides sharper and consistent
performance.
|
| 11/10 | Angela Yu Dr. Yu recently joined the UCSD Cognitive Science Department as an Assistant Professor. She received her PhD from Gatsby Computational Neuroscience Unit at University College London, and did her postdoctoral training at Princeton University. In her research, she uses computational and mathematical models, and complementary experimental work, to understand the neural machinery underlying cognitive processes, such as attention, vision, learning, and decision-making. |
Optimal
decision making under uncertainty and temporal pressure Most models of decision-making
in neuroscience assume an infinite horizon, which yields an optimal
solution that integrates evidence up to a fixed decision threshold; however, under most
experimental as well as naturalistic behavioral settings, the decision has to
be made before some finite deadline, which is often experienced as a stochastic
quantity, either due to variable external constraints or internal timing uncertainty.
In this work, we formulate this problem as sequential hypothesis testing under a
stochastic horizon. We use dynamic programming tools to show that, for a large
class of deadline distributions, the Bayes-optimal solution requires integrating evidence
up to a threshold that declines monotonically over time. We use numerical
simulations to illustrate the optimal policy in the special cases of a fixed deadline and
one that is drawn from a gamma distribution.
|
| 11/17 | Fan Chung Dr. Chung is a professor of mathematics and of computer science at UCSD. |
Four graph partitioning algorithms We will discuss four partitioning algorithms using eigenvectors, random walks, PageRank and their variations. In particular, we will examine local partitioning algorithms, which find a cut near a specified starting vertex, with a running time that depends on the size of the small side of the cut, rather than on the size of the input graph (which can be prohibitively large). Three of the four partitioning algorithms are local algorithms and are particularly appropriate for applications arising in connection with web graphs and Internet economics. |
| 11/24 | Fei Sha Dr. Sha is an assistant professor of computer science at the University of Southern California. |
DiscLDA: Discriminative learning
for dimensionality reduction and classification Abstract:
Probabilistic topic models (and their extensions) have become
popular as models of latent structures in collections of text documents
or images. These models are usually treated as generative models and
trained using maximum likelihood estimation, an approach which may be
suboptimal in the context of an overall classification problem. In this
paper, we describe DiscLDA, a discriminative learning framework for
such models as Latent Dirichlet Allocation (LDA) in the setting of
dimensionality reduction with supervised side information. In DiscLDA,
a class-dependent linear transformation is introduced on the topic
mixture proportions. This parameter is estimated by maximizing the
conditional likelihood using Monte Carlo EM. By using the transformed
topic mixture proportions as a new representation of documents, we
obtain a supervised dimensionality reduction algorithm that uncovers
the latent structure in a document collection while preserving
predictive power for the task of classification. We compare the
predictive power of the latent structure of DiscLDA with unsupervised
LDA on the 20 Newsgroup document classification task.
Joint work with Simon Lacoste-Julien (UC Berkeley) and Michael I. Jordan (UC Berkeley) |
| 12/1 | Shlomo Dubnov Dr. Dubnov is an associate professor of music at UCSD. |
The importance of now: Aspects of
information dynamics and music perception Interactive artistic applications such as machine improvisation and participatory performance require dynamic association of data over time and monitoring of interest, excitement or surprisal. Motivated by cognitive models of music expectancy, several measures have been proposed for analysis of music structure in terms of information dynamics. In the talk I will survey some of the recent works applying information rate, predictive information and so on to MIDI and audio data, and relate it to a new work on musical information geometry. |
Most recently updated on November 17, 2008 by Charles Elkan, elkan@cs.ucsd.edu