============================================================================ IJCNLP 2017 Reviews for Submission #151 ============================================================================ Title: Estimating Reactions and Recommending Products with Generative Models of Reviews Authors: Jianmo Ni, Zachary C. Lipton, Sharad Vikram and Julian McAuley ============================================================================ REVIEWER #1 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Appropriateness: 5 Clarity: 3 Originality: 4 Soundness / Correctness: 4 Meaningful Comparison: 5 Substance: 5 Impact of Ideas / Results: 4 Impact of Accompanying Software: 1 Impact of Accompanying Dataset / Resource: 1 Recommendation: 4 Reviewer Confidence: 4 --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- A solid work! The paper is somewhat too dense since it describes the model with respect to three problems, item recommendation, review generation and review ranking. It is nice to see illustrative examples for review generation; it would have been also very helpful to see some for the other two problems. The paper would profit from a thorough proof reading! ============================================================================ REVIEWER #2 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Appropriateness: 5 Clarity: 3 Originality: 4 Soundness / Correctness: 3 Meaningful Comparison: 4 Substance: 4 Impact of Ideas / Results: 4 Impact of Accompanying Software: 3 Impact of Accompanying Dataset / Resource: 1 Recommendation: 4 Reviewer Confidence: 3 --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- This paper presents a single model for ranking and generating item reviews and recommending items using a combination of text and implicit (clicks/purchase) feedback. The model is based on collaborative filtering and stacked LSTM's. Overall, the paper is mostly clearly written, makes good reference to previous literature, and presents a clear problem and novel solution with strong results. While collaborative filtering for recommendations and character-level stacked LSTM's for generation have been well explored, the combination and exploration of biasing review generation with user factors seems novel. The empirical results are strong; the synthetic reviews especially. The exploration of sparsity/cold start is quite nice, especially for practical settings. Why not use a jointly trained across all datasets word embedding model, or a pretrained one? The validation/test set of two interactions seems very small, how many instances are there in each test set? ============================================================================ REVIEWER #3 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Appropriateness: 5 Clarity: 4 Originality: 4 Soundness / Correctness: 4 Meaningful Comparison: 5 Substance: 4 Impact of Ideas / Results: 4 Impact of Accompanying Software: 1 Impact of Accompanying Dataset / Resource: 1 Recommendation: 5 Reviewer Confidence: 4 --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- This paper proposes to jointly model a user's preferences for an item and the user's likely review about an item in an online recommendation system setting. Using this model, the authors can recommend items to the user (using the preference scores), as well as generate plausible reviews that the user might write about an item he/she still has not interacted with. The basic idea is that the information in the review text (in the form of embeddings) helps learn better preference scores, and the preference scores (which are computed using the typical collaborative filtering approach) help learn a latent user-item language model. They show that this joint model outperforms existing methods on the item recommendation task. Overall, the idea is cool, the approach is reasonable and the evaluation is (mostly) convincing. The paper is well written (see caveat below) and the method is easy to understand and should be possible to replicate. The task description is a bit confusing at first. There is no clear definition until line 240 (section 3!) and even then, it is not intuitively clear why this is a useful task. It only becomes really clear once one understands the model and the evaluation. An improved task description and motivation in the introduction, which clearly states the advantages of the model and some examples of real-world tasks that the generated reviews can be used for would be helpful. The evaluations are mostly convincing, although the RQ1 should really include a baseline version of the preference scores used in this paper (without the information from the reviews), to better understand the contribution. RQ2 could benefit from a human evaluation in addition to perplexity, but it's obviously understandable that it would be a difficult and costly evaluation. Still, something in addition to perplexity (BLEU-like metrics?) would be nice to see. RQ3 is very cool. Unless I am misunderstanding completely, eq.2 describes a single-layer MLP, which is much better described as a linear model or logistic regression. It takes a moment to understand where is the "MLP" there, which hurts the coherence of the paper. It's unnecessarily confusing to refer to a single layer model as "multi-layer". minor: 248 - remove duplicate "state"