----------------------- REVIEW 1 --------------------- SUBMISSION: 3047 TITLE: Personalized Complementary Product Recommendation AUTHORS: An Yan, Chaosheng Dong, Yan Gao, Jinmiao Fu, Tong Zhao, Yi Sun and Julian Mcauley ----------- Overall evaluation ----------- SCORE: 1 (Accept) ----------- Reviewer's confidence ----------- SCORE: 4 ((high)) ----------- Detailed Comments ----------- This work discusses personalization in the context of complementary product recommendation. The authors highlight challenges of personalization in this context given the noisy nature of co-purchases and data sparsity challenges. Their proposed approach consists of Graph Attention Network for learning product representations and a Transformer for learning user representations (for personalization) and combines both through re-ranking. Their 2-stage pipeline is sensible and common in recommender systems. In addition, they take an additional “augmentation step” to generate positive pairs for use in contrastive learning of user representations, however it is not apparent what sort of cropping or reordering is done to sessions here – further clarification would be appreciated. Results are sound, as is the ablation study which demonstrates the effectiveness of various components of the proposed approach. More work on the “case study” would be appreciated, even if included in the appendix – it is important to get a qualitative understanding of the model, and being able to visualize its success and failure modes will be very insightful. For example, they highlight that when “paper” has been viewed, it is not recommended again – is this behaviour consistent across other products? Are there other interesting prediction patterns which align with human intuition? It might be worth including the following as related work for personalization in general and product representation learning using transformers: https://dl.acm.org/doi/10.1145/3366424.3386198 https://arxiv.org/abs/2012.09807 ----------------------- REVIEW 2 --------------------- SUBMISSION: 3047 TITLE: Personalized Complementary Product Recommendation AUTHORS: An Yan, Chaosheng Dong, Yan Gao, Jinmiao Fu, Tong Zhao, Yi Sun and Julian Mcauley ----------- Overall evaluation ----------- SCORE: -1 (Reject) ----------- Reviewer's confidence ----------- SCORE: 3 ((medium)) ----------- Detailed Comments ----------- == Summary == The paper proposes a method for complementary personalized product recommendation. To model items the paper proposes to use graph attention network to obtain item embeddings. The idea is that taking into account item similarities one can obtain better item embeddings, i.e. that might help boost recommendations for rarely bought items by propagating information from frequently bought items to similar rarely bought items. In addition to better item embeddings the paper proposes to use customer purchase histories to personalize recommendations for each customer. To model purchase histories the Transfomer encoder is used. These two sources of information are fused by attention (customer purchase history embedding attends to product embeddings) and trained jointly end-to-end to minimize a combination of three losses: a ranking loss on pairs of items (not clear how positive and negative sets are determined), a ranking loss of pairs of items purchased together (positive pair: purchased together, negative pair: not purchased together) and self-supervised contrastive loss on pairs of jittered (cropped and reordered) user histories. The results are presented on (propriotary?) Amazon office dataset, in terms of Hit@{1,3,10} on product, sub-category (ITK) and category level, where it outperforms non-personalized GAT baseline and personalization via simple purchase history embeddings (averaging). == Open questions == Main concern is the clarity of exposition of main ideas and the readability of the paper. Although I understand the main points, I could not re-implement the method because many important details are missing. In particular: - It's not clear what "local connections" are used for aggregation (1st paragraph of section 2.1). Are these just kNN neighborhoods? If yes, what is the size of the neighborhood? How are these neighbors computed on (presumably huge) kNN graph? - It would be good to explicitly state how are attention weights $\alpha$ are used for aggregation. - It is not clear what is meant under "product-relation embeddings"? Are those just product embeddings that are output of GAT? - It is not clear why recommending most similar items (where similarity is computed from GAT outputs) yields *complimentary* recommendations. The "complementary product recommendation" is in the title of the paper, but it's not explain how is complementarity achieved. - There's just too much notation abuse in Eq. 3, it's very difficult to parse it. Maybe instead of $\theta_{\pm i}$ it would be better to use $\theta_y^i$ or $\theta_i^y$, and have another sum to denote iteration over elements of positive and negative sets? How are these sets determined? Or are $\theta_{+i}$ and $\theta{-i}$ single elements instead of sets? If yes, then how are these elements selected? Random sampling? - It is not clear how is margin $\lambda$ in Eq. 3 determined. Is it the same value as $\lambda$ in Eq. 5? - Is there a reason why are only purchases considered? Including other user events (clicks, add-to-carts) would improve the situation regarding sparsity. - No CLS token to embed the sequence, no positional encodings to take into account temporal order? What is the architecture of the transformer encoder? - It is not clear how is actual re-ranking done in production. Is it done by ranking the articles according to the norm of the difference vector modulated by personalized embedding $u$, i.e. by ordering candidates $\theta_candidate$ by ascending $||(\theta_i - \theta_candidate) * u||$? - The sentence "The distance between two features are weighted by a user preference embedding u learned from historical purchases" does not correspond to what is displayed in Eq. 5. There the user preference embedding u modulates the difference vector, and the norm of this modulated difference vector is used to rank the products. - It is not clear what co-purchased refers to, and therefore it's not clear how are $\theta_{\pm c}$ obtained. Does it refer to the products purchased in the same session or purchased by the same customer possibly in different sessions? Later in the paper "co-purchase sessions" are mentioned, but it's still not clear if that refers what that refers to exactly. What is the difference between $\theta_{\pm i}$ in Eq 3. and $\theta_{\pm c}$ in Eq 5. - It is mentioned that sequential behavior modeling needs to learn "a huge number of parameters" (from a small training set), but it's not clear how huge that transformer model is, and why it needs to be huge. - It is not clear what are the values of $\lambda_1$ and $\lambda_2$. Is it (0.5, 0.5) or (0.999, 0.001)? - In Table 1 it's not clear what "Projection" does: how are "user embeddings" obtained? - It's mentioned that "[personalized model outperform non-personalized] demonstrating the necessity of personalization for complimentary product recommendation", but that's not always the case, as we see that GAT+Avg consistently underperforms GAT, so the way how personalization is included matters a lot. It's also not clear why GAT+Avg underperforms GAT. ----------------------- REVIEW 3 --------------------- SUBMISSION: 3047 TITLE: Personalized Complementary Product Recommendation AUTHORS: An Yan, Chaosheng Dong, Yan Gao, Jinmiao Fu, Tong Zhao, Yi Sun and Julian Mcauley ----------- Overall evaluation ----------- SCORE: 1 (Accept) ----------- Reviewer's confidence ----------- SCORE: 3 ((medium)) ----------- Detailed Comments ----------- This paper tackles the tricky issue of complementary products in product recommendation. The authors propose to solve the problem by adding a product relation graph and performing a personalised re-ranking by bringing this graph and personalised embeddings together. The paper is well-written and the experiments fairly clear. Section 2.4 could/should be slightly clarified (it's not completely clear what contrastive learning brings conceptually and a word of explanation as to why it is a good idea would be helpful). A few typos can be corrected (some articles are missing or should be removed, e.g., "(...) modeling, THE/THIS sequential prediction task needs".