We investigate a Gaussian latent variable model for semi-supervised learning of linear large margin classifiers.
The goal of semi-supervised learning is to build predictive models from small collections of labeled examples
but large collections of unlabeled ones. For details, please read our paper .
Do-kyum Kim, Matthew Der and Lawrence K. Saul.
A Gaussian Latent Variable Model for Large Margin Classification of Labeled and Unlabeled Data.
In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS 2014). Reykjavik, Iceland. [paper] [supplement]
You can find our implementation at GitHub .
These are the data sets we used in the paper.
Each tar archive contains a term-document matrix and twelve random splits for different numbers of labeled examples.
Each split file is named as '*_splits12_L#.mat', where '*' and '#' denote the name of the data set
and the number of labeled examples respectively.
In the 'mat' file, the variable 'idxLabs' contains the indexes of the labeled examples in the split; all the others are
used as unlabeled examples.
[20-Newsgroups] [ccat] [gcat] [aut-avn] [real-sim] [Freelancer]
Update on September 10, 2014
We have updated the code for EMBLEM to fix a bug in the initial implementation of eq. (25). The bug had only a slight effect on performance, but for reference, we provide updated (and slightly improved) experimental results below. Thanks to Suqi Liu for reporting the bug. [20-Newsgroups] [ccat] [gcat] [aut-avn] [real-sim] [Freelancer]