Link prediction via matrix factorization

MATLAB code is available here. The scripts do basic classification loss and the ranking loss we describe in the paper.

The file link_prediction_test_script is a sample script that demonstrates usage on a synthetic network. The script illustrates the use of both the classification and ranking loss, and the presence and absence of side-information. Sample output of the script:

>> link_prediction_test_script;
testing setting: standard loss, # node features = 0, # link features = 0
rmse of predicted vs true probabilities = 0.2205
optimal auc = 0.6959
predicted auc = 0.6524

...

testing setting: ranking loss, # node features = 4, # link features = 1
rmse of predicted vs true probabilities = 0.3257
optimal auc = 0.7260
predicted auc = 0.7126


Response prediction using collaborative filtering with hierarchies and side-information

MATLAB code for methods in this paper is available here. We are unable to provide the MapReduce code as it was produced for use in a corporate environment. However, the MATLAB code explains the essential ingredients needed to implement the methods, which only need simple gradient-based optimization. Currently the code includes methods that were not discussed in the paper; we will clean up the code shortly to minimize confusion.

The role of calibration in predicting accurate probabilities

The code of this paper is available here. The code includes the experiment showing the lack of calibration for a misspecified linear and logistic regression model, and our script for the comparison of various probability paradigms. We will add detailed instructions on how to replicate the experiments in the paper shortly.

A log-linear model with latent features for dyadic prediction

A stochastic gradient implementation of the LFL model may be found here. The code assumes that there are structures Tr and Te for the train and test set, each comprising three vectors i, j, and r for the "user", "movie", and "rating" respectively. (As noted in the paper, these may be replaced with more general dyadic entities.) The code handles both nominal and ordinal "ratings". Sample usage:

k = 10; % # of latent features
eta0 = 0.01; % learning rate
lambda = 1e-6; % regularization parameter
epochs = 10; % # of sweeps over training set
loss = 'mse'; % loss function on training set

[w, trainErrors, testErrors] = lflSGDOptimizer(Tr, Te, k, eta0, lambda, epochs, loss);


Fast Algorithms for Approximating the Singular Value Decomposition

The code for this paper is available here. To run the code, unzip the archive and in MATLAB, execute:

A = ...; // Load appropriate data matrix
kMin = ...;
kMax = ...;
compareSVDMethods(A, kMin:kMax)

Last updated: March 09, 2011