Data is available at http://cseweb.ucsd.edu/~jmcauley/pml/data/. Download and save to your own directory

Visual compatibility model

This code reads image data in a specific binary format, described here: https://datarepo.eng.ucsd.edu/mcauley_group/data/amazon/links.html

Extract metadata describing ground-truth compatibility relationships among items

Number of compatible pairs

Define the compatibility model

Exercises

9.1

For these exercises we use musical instrument data; we do so because (a) it has fine-grained subcategories (e.g. "accessories", "guitars", etc.) which can be used for these exercises; and (b) because it is small. These exercises might ideally be run with a large category of (e.g.) clothing images, though such datasets are larger and more difficult to work with.

First collect the subcategories associated with each item (for use in Exercise 9.3)

Read image data

Extract compatibility relationships. Build our collection of "difficult" negatives consisting of items from the same category.

9.2 / 9.3

Modify the model to compute similarity based on the inner product rather than Euclidean distance

Compare models based on the inner product and Euclidean distance. Both make use of "difficult" negatives (Exercise 9.3)

Compute accuracy (what fraction of positive relationships were predicted as positive)

9.4

t-SNE embedding

Scatterplots by subcategory aren't particularly interesting in this case. Try e.g. price or brand for more compelling examples.