UCSD logo Jacobs CSE

Charles Elkan

As a professor in the computer science and engineering department at the University of California, San Diego, my main research interests are in machine learning and data science. I work especially on foundational questions raised by applications in business and biomedicine. Research in my group has been funded by an R01 grant from the National Institutes of Health, by the National Science Foundation, by a grant from the University of California National Lab cooperation program, and by gifts from Intel and other companies.

My Ph.D. is in computer science from Cornell University, with a graduate minor in economics. While at Cornell, I was also a visiting Ph.D. student at Stanford University advised by John McCarthy, and before joining UCSD I was a postdoctoral fellow at the University of Toronto. My undergraduate degree is in mathematics from Cambridge University, with a focus on statistics and optimization. While on leave from UCSD, I have been a visiting associate professor at Harvard University.

From 2014 to 2018 I worked in Seattle as the first Amazon Fellow, building and leading Amazon's central machine learning team in Seattle, Palo Alto, and New York. Since 2018, I have been a managing director and the global head of machine learning at Goldman Sachs in New York. For more information see this curriculum vitae.

Before joining Amazon, in winter 2014, I taught CSE 250B, a graduate course on machine learning. In spring 2013 I taught CSE 255, a graduate course on data science and analytics, while in fall 2012 I taught CSE 250A, a different graduate course on machine learning.

For a complete list of publications, with links to full papers, see DBLP and this Google Scholar profile.

Selected publications

Efficient Elastic Net Regularization for Sparse Linear Models
Z. Lipton and C. Elkan
http://arxiv.org/abs/1505.06449 pdf 
We show how to train sparse linear models efficiently with elastic net regularization. The new algorithm applies stochastic gradient updates to non-zero features only, bringing weights current as needed with closed-form updates. We provide dynamic programming algorithms that perform each delayed update in constant time. Experimental results show that on a bag-of-words dataset with 260,941 features, but only 88 nonzero features on average per training example, the dynamic programming method trains a logistic regression classifier with elastic net regularization over 2000 times faster than otherwise.
Probabilistic Modeling of a Sales Funnel to Prioritize Leads
B. Duncan and C. Elkan
In Proceedings of the 21st ACM Conference on Knowledge Discovery and Data Mining (KDD), July 2015
This paper presents probabilistic models that rank sales leads based on their probability of conversion to a solid opportunity, successful sale, and/or expected revenue. The trained models replace traditional lead scoring systems, which are error-prone and not probabilistic. Experimental results are shown on real sales data from two companies. For one company, a 307% increase in number of successful sales is achieved, as well as a dramatic increase in total revenue. Deployment shows that additional benefits include decreased time needed to qualify leads, and decreased number of calls placed to schedule a product demo.

A Critical Review of Recurrent Neural Networks for Sequence Learning
Z. Lipton and C. Elkan
http://arxiv.org/abs/1506.00019 pdf 
Recurrent neural networks are connectionist models that capture the dynamics of sequences via cycles in the network of nodes, retaining a state that can represent information from an arbitrarily long context window. In recent years, systems based on long short-term memory and bidirectional architectures have demonstrated ground-breaking performance on tasks as varied as image captioning, language translation, and handwriting recognition. We synthesize the research that has made practical these powerful learning methods, providing a self-contained explanation of the state of the art together with a historical perspective and references to primary research.

Differential Privacy Based on Importance Weighting
Z. Ji and C. Elkan
In Machine Learning, June 2013 pdf 
We propose and analyze a general method for publishing data while still protecting privacy, by computing weights that make an already public dataset analogous to the dataset that must be kept private. The weights are importance sampling coefficients that are regularized and have noise added to protect privacy. The weights allow arbitrary queries to be answered approximately while provably guaranteeing differential privacy. Experiments show that the new mechanism performs well even when the privacy budget is small, and when the public and private datasets are drawn from different populations.
Link Prediction via Matrix Factorization
A. K. Menon and C. Elkan
In Proceedings of the European Conference on Machine Learning (ECML), September 2011 pdf 
We show how to learn to predict which edges exist in a social network (or other graph) using a matrix factorization approach. The new method learns latent features that capture the structure of a network, and combines these with explicit side-information. The algorithm directly optimizes a ranking loss, and scales to very large networks. Results on many social and other networks show the superior accuracy of the new method.
Nonlinear Support Vector Machines Can Systematically Identify Stocks with High and Low Future Returns
R. Huerta, C. Elkan, and F. Corbacho
In Algorithmic Finance pdf
This paper rigorously develops a reliable model to identify stocks with expected high and low future returns. Technical and fundamental features are computed using CRSP and Compustat data. From 1981 to 2010, taking into account realistic trading costs and constraints, the model leads to annual Jensen alpha over 10% with standard deviation 8%.
Accounting for Word Burstiness in Topic Models
G. Doyle and C. Elkan
In Proceedings of the 26th International Conference on Machine Learning (ICML), July 2009 pdf 
A fundamental property of language is that if a word is used once, then it is more likely to be used again. Previous topic models fail to capture this burstiness phenomenon. This paper presents a topic model that uses Dirichlet compound multinomial distributions to model burstiness. The new model achieves better goodness of fit in text mining with far fewer topics than standard latent Dirichlet allocation (LDA). More information.
Some invited talks

Learning to make predictions in networks. Department of Computer Science and Automation, Indian Institute of Science, Bangalore, December 21, 2011.

The analytics landscape: A personal view. Indo-US Workshop on Large Scale Data Analytics and Intelligent Services, Bangalore, December 20, 2011. pdf

A vision for reinforcement learning and predictive maintenance. Keynote talk, Workshop on Data Mining for Service and Maintenance, ACM International Conference on Knowledge Discovery in Databases (KDD), August 21, 2011. pdf


Some sources that have interviewed me, or that have mentioned research from my group. Click a logo to read an article.

            Dobb's logo New Scientist
            logo Reuters logo Miller-McCune logo Wall Street Journal logo
              Photo of Charles Elkan

2020/1/24: Committee member for the Ph.D. defense of Morteza Ashraphijuo at Columbia University.

January 2020: Three lectures on deep learning at the 6th International Winter School on Big Data in Ancona, Italy.

2019/12/13: Panel speaker at the NeurIPS 2019 Workshop on Robust AI in Financial Services.

2019/10/24: Keynote talk at the AI World event in Boston.

August 2019: Area chair for NeurIPS 2019 to be held in Vancouver.

2015/12/12: Invited talk at the NIPS Workshop on Learning in  Large Label Spaces.

2015/6/19: Keynote speech on differential privacy at Benelearn 2015.

2015/6/3: Keynote speech on data science at the Canadian AI conference.

2014/3/3: Invited talk at the Machine Learning and Data Analytics Symposium in Qatar.

2013/6/11: Our paper on differential privacy has been accepted by Machine Learning.

2013/5/15: Keynote talk at the Qualcomm analytics summit.

2013/5/10: Postdoc position available funded by Intel (filled).

2013/4/16: Kickoff meeting for research with SRI funded by IARPA.

2013/3/29: Visiting Cornell tech and Google New York.

2013/2/20: Speaking at the IE Group Chief Data Scientist Summit, San Diego.

2013/1/29: Qualcomm distinguished lecture at ICNC 2013.

2012/12/14: Just finished teaching CSE 250A. 63 students in a graduate course!

2012/12/11: Keynote speaker at the International Symposium on Multimedia.

2012/9/24: In Bristol to present two papers at ECML.

2012/9/17: Appointed area chair for ICML 2013.

2012/6/26: In Edinburgh to present a paper at ICML.

2012/1/26: Appointed area chair for the AAAI conference.

2012/1/8: Looking for a bioinformatics/ML postdoc.

2012/1/4: 57 graduate students enrolled for CSE 250B.

2011/11/16: Zhanglong Ji wins best poster award at the San Diego Data Summit meeting.

2011/10/24: Invited to be a plenary keynote speaker at PAKDD'12.

2011/10/8: Our paper Nonlinear SVMs can systematically identify stocks with high and low future returns is among the most downloaded in two SSRN categories, Capital Markets and Econometrics.

2010/5/1: Our paper Predicting labels for dyadic data selected as one of the 7 best from 658 submissions to ECML/PKDD.


(619) 379-9852

              profile for Charles Elkan