DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
UNIVERSITY OF CALIFORNIA, SAN DIEGO

CSE 254: Seminar on Learning Algorithms

Project Reports

June 2001


Sameer Agarwal, Learning from Incomplete Data.

Abstract:  Survey non-response is an important problem in statistics, economics and social sciences.  This paper reviews the missing data framework of Little and Rubin, and outlines techniques that use a likelihood-based approach to deal with non-response in surveys.  The paper focuses on the case where the probability of an observation being missing depends on its value.  The paper uses the two-step model introduced by Heckman to illustrate and analyze three popular techniques: maximum likelihood, expectation maximization (EM), and Heckman's procedure.  We explain variants of Heckman's model and discuss their advantages and disadvantages.  The paper also compares Heckman's procedure with the EM algorithm, pointing out how Heckman correction is not an EM algorithm.


Kristin Branson, A Naive Bayes Classifier Using Transductive Inference for Text Classification.

Abstract:  In text classification tasks, it is often the case that the set of test documents is known at the time the classifier is learned.  This is the case, for example, when a fixed database of documents is searched for documents similar to a small training set.  The algorithm presented in this paper, transductive naive Bayes (TNB), learns from both the training documents and the distribution of the unclassified test documents.  TNB first labels the test documents using a naive Bayes (NB) classifier learned from the training documents, then learns another classifier from these tentatively labeled test documents.  The classifications made by the second classifier are returned.  TNB has the advantage that its classifier is specific to the actual documents to be classified, and benefits from information about their probability distribution.  Experimental results with TNB show that it significantly outperforms naive Bayes.  On the Reuters-21578 data set with a standard experimental design, TNB achieves average precision-recall breakeven of 81.4%, compared to 55.2% for NB, and 60.8% for the transductive support vector machine method of Joachims.  A detailed experimental investigation shows that TNB gains mostly from tailoring the classifier learned to the actual distribution of the test data.
 
Joseph Drish, Obtaining Calibrated Probability Estimates from Support Vector Machines.
Abstract:  We use a technique known as binning to convert the outputs of support vector machine (SVM) classifiers into well-calibrated probabilities.  Using the KDD'98 data set as a testbed, we evaluate predicted probabilities using four metrics, and compare our results to those obtained by Zadrozny and Elkan for naive Bayes classifiers.  We stress test the latest release of the LIBSVM software, and use its new functionality for training on unbalanced data sets to find the best parameters for the SVM classifiers.  We demonstrate that using the F1 value as a metric for tuning SVMs is successful for making them work on the KDD'98 data set, which is highly unbalanced.  On the other hand, we show that binning works better for naive Bayes classifiers, and that it is difficult to avoid overfitting directly using SVMs.
 
Melanie Dumas, Emotional Expression Recognition using Support Vector Machines.
Abstract:  The objective of this paper is to apply support vector machines (SVMs) to the problem of classifying emotion on images of human faces.  This well-defined problem is complicated by the natural variation in people's faces, requiring the classification algorithm to distinguish a small number of relevant features in a large pool of input features.  Recent experiments using principal component analysis for dimensionality reduction and a perceptron for classification have achieved over 85% classification accuracy.  The experiments described here show that SVMs directly achieve similar accuracy.
 
Gyozo Gidofalvi, Using News Articles to Predict Stock Price Movements.
Abstract:  This paper shows that short-term stock price movements can be predicted using financial news articles.  Given a stock price time series, for each time interval we classify price movement as "up," "down," or (approximately) "unchanged" relative to the volatility of the stock and the change in a relevant index.  Each article in a training set of news articles is then labeled "up," "down," or "unchanged" according to the movement of the associated stock in a time interval surrounding publication of the article.  A naïve Bayesian text classifier is trained to predict which movement class an article belongs to.  Given a test article, the trained classifier potentially predicts the price movement of the associated stock.  However, the efficient markets hypothesis asserts that this classifier cannot have predictive power.  In careful experiments we find definite predictive power for the stock price movement in the interval starting 20 minutes before and ending 20 minutes after news articles become publicly available.


Hector Jasso, Evolving Noise Schedules for Stochastic Search Algorithms for SAT.

Abstract:  In stochastic local search algorithms, the selection of the best noise parameter for a particular problem instance is important in order to optimize the search.  In current algorithms, the noise parameter is defined statically at the start and does not change during the algorithm's run.  This paper explores the effect of changing the noise parameter during the algorithm's run by using a noise schedule. A noise schedule consists of a series of slots, where each slot specifies the number of flips to use a given noise level. Each of the n slots is applied consecutively, and in a wraparound fashion, such that after the nth slot, the first slot is applied again. Since it is time-consuming to construct noise schedules manually, they are instead evolved using genetic algorithms. For a specific 3SAT problem instance, evolved noise schedules for the WalkSAT algorithm are shown to perform comparably to the best static noise level, both in minimizing the average number of flips to solve a problem, and in minimizing the variance in number of flips.
 
David Kauchak, Effect of Boosting in Boosted Wrapper Induction.
Abstract:  A wrapper is a simple, but highly accurate information extraction rule.  Unfortunately, wrappers tend to have low recall.  To remedy this problem, Freitag and Kushmerick developed a method named "boosted wrapper induction" (BWI) that combines a weak wrapper learner with AdaBoost to generate sets of extraction rules.  The result is an algorithm with a bias towards precision, but with reasonable recall in traditional extraction domains.  However, the exact benefit of boosting in BWI is not clear.  We examine the benefits of boosting by comparing BWI to two different sequential covering algorithms with the BWI weak wrapper learner.  Sequential covering is a straightforward algorithm which tries to cover as many possible positive examples with a single rule, removes the covered examples from the training set and continues until all the positive examples have been covered.  We show experimental results from a broad range of information extraction tasks and show that the basic benefit of boosting is to allow BWI to continue learning new and helpful rules without overfitting the training data, even after all positive examples are covered.  This conclusion is consistent with previous theoretical and experimental results.
 
Jonathan Ultis, AdaBoost for Query-by-Example in Text.
Abstract:  This paper describes a new method for generating useful query words automatically from relevance feedback.  A user first performs a standard keyword search, using Google in the present implementation.  If s/he requires higher precision or recall, s/he marks one or more documents in the result set as relevant.  The AdaBoost algorithm is then used to learn a text classifier that separates the marked documents from a collection of irrelevant documents.  The words identified as most informative by AdaBoost are then used as a refined Google query, and the user may repeat the process.  Preliminary results are promising, because the system clearly proposes new query words that are sensible, and useful for improving both precision and recall.  Further research is required to determine how many new keywords to use, how to provide the keywords to the search engine, and which combination of near-misses and far-misses to use as a training set of irrelevant documents.  Readers may try the implementation at http://24.25.217.158/research/example/index.php.


Bianca Zadrozny, Reducing Multiclass to Binary by Coupling Probability Estimates.

Abstract:  This paper presents a method for obtaining class membership probability estimates for multiclass classification problems by coupling the probability estimates produced by binary classifiers.  This is an extension for arbitrary code matrices of a method due to Hastie and Tibshirani for pairwise coupling of probability estimates. Experimental results with boosted naive Bayes show that our method produces calibrated class membership probability estimates, while having similar classification accuracy as loss-based decoding, a method for obtaining the most likely class that does not generate probability estimates.