We investigate a Gaussian latent variable model for semi-supervised learning of linear large margin classifiers. The goal of semi-supervised learning is to build predictive models from small collections of labeled examples but large collections of unlabeled ones. For details, please read our paper .


Do-kyum Kim, Matthew Der and Lawrence K. Saul.
A Gaussian Latent Variable Model for Large Margin Classification of Labeled and Unlabeled Data.
In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS 2014). Reykjavik, Iceland.
[paper] [supplement]


Source code

You can find our implementation at GitHub .

Data sets

These are the data sets we used in the paper. Each tar archive contains a term-document matrix and twelve random splits for different numbers of labeled examples. Each split file is named as '*_splits12_L#.mat', where '*' and '#' denote the name of the data set and the number of labeled examples respectively. In the 'mat' file, the variable 'idxLabs' contains the indexes of the labeled examples in the split; all the others are used as unlabeled examples.
[20-Newsgroups] [ccat] [gcat] [aut-avn] [real-sim] [Freelancer]

Experimental results

These are the experimental results we reported in our paper:
[20-Newsgroups] [ccat] [gcat] [aut-avn] [real-sim] [Freelancer]

Update on September 10, 2014

We have updated the code for EMBLEM to fix a bug in the initial implementation of eq. (25). The bug had only a slight effect on performance, but for reference, we provide updated (and slightly improved) experimental results below. Thanks to Suqi Liu for reporting the bug.
[20-Newsgroups] [ccat] [gcat] [aut-avn] [real-sim] [Freelancer]