Data is available at http://cseweb.ucsd.edu/~jmcauley/pml/data/. Download and save to your own directory

Factorization Machine (fastFM)

Parse the Goodreads comic book data (excluding review text)

For example...

Utility data structures. Most importantly, each user and item is mapped to an ID from 1 to nUsers/nItems

Build the factorization machine design matrix. Note that each instance is a row, and the columns encode both users and items. Other features could straightforwardly be added.

Target (rating) to predict for each row

Initialize the factorization machine

Split data into train and test portions

Train the model

Extract predictions on the test set

Exercises

6.1

Simple example, just incorporating a one-hot encoding of the year (see data extraction in examples above)

6.2

Cold start plots. Count training instances per item (could also measure coldness per user if we had user features).

6.3

Read social data from epinions

BPR model. First we'll use a regular BPR model just to assess similarity between friends' latent representations. Later we can implement different social sampling assumptions just by passing different samples to the same model.

First, train a regular BPR model (no social terms)

Compute similarities among friends' latent representations

Similarity between randomly chosen pairs of items

Similarity between an item and one consumed by a friend

(similarity is not particularly high, but still significantly higher than random pairs)

6.4

Implement the social model. Uses the model above, just with different samples.