----------------------- REVIEW 1 --------------------- PAPER: 59 TITLE: Hidden Factors and Hidden Topics: Understanding Rating Dimensions with Review Text AUTHORS: Julian McAuley and Jure Leskovec OVERALL EVALUATION: 3 (strong accept) REVIEWER'S CONFIDENCE: 3 (medium) Reviewer's Confidence: 2 (medium) Relevance to RecSys: 5 (excellent) Novelty: 5 (excellent) Technical Quality: 5 (excellent) Significance: 4 (good) Presentation and Readability: 4 (good) Reproduce the results presented: 1 (not easy) Candidate for best paper award: 2 (Yes) Suitable to be accepted as a practitioner report as part of the industrial session: 1 (No) ----------- REVIEW ----------- The paper presents a novel model that takes advantage of the large quantity of typically unused data in reviews - their text. The proposed HFT (hidden factors as topics) model outputs topics that are correlated with the user and item latent factors. The proposed experiments show that this alignment is useful for a plurality of tasks, including rating prediction, solving the cold start problem, identifying useful reviews or genre discovery. An important observation regards the size of the employed dataset - in excess of 42 million. As a whole, this is an excellent paper. Nicely written, it provides a novel solution to an old problem - how to use the review text. I do, however, have some questions: - Section 4.6: The choice of the number of topics (K=5 / 10) is as surprising as the results obtained. While the top words for each of the topics in all the cases in Table 4 would seem to match an intuitive subcategory, those subcategories are just some of many. For clothing, for instance, there are many more types. So, although it may seem that these topics are correlated with actual categories, that may not be the case. Wouldn't a much higher K value be needed to actually do this matching? This is, to me, the most interpretable and weakest part of the paper. That space might be put to better use in providing additional details about the model. - Section 4.6: It would be interesting to see the original topics, not those generated after removing the averages. - Section 4.7: The quantitative results in section 4.7 refer to the Yelp dataset, which is also the only one where K=10 is shown. However, in Table 5, the results for K=5 look even more impressive, in terms of improvement over latent factors and LDA. Why not show those? - Section 4.4, Table 2: The results for K=10 are superior to K=5. Why not go further. Given the size of the dataset, I would tend to think many more topics would be present. The authors wonder why the product and user topics seem to be the same. I wonder why there aren't any noticeable differences if the topics change altogether. - Section 4.2: Given the size of most datasets, a user oriented CF approach might also give good results. Why only compare with product centric methods? -Section 3: The description of Equation 9 is really unclear. ----------------------- REVIEW 2 --------------------- PAPER: 59 TITLE: Hidden Factors and Hidden Topics: Understanding Rating Dimensions with Review Text AUTHORS: Julian McAuley and Jure Leskovec OVERALL EVALUATION: 3 (strong accept) REVIEWER'S CONFIDENCE: 4 (high) Reviewer's Confidence: 3 (high) Relevance to RecSys: 5 (excellent) Novelty: 4 (good) Technical Quality: 4 (good) Significance: 5 (excellent) Presentation and Readability: 4 (good) Reproduce the results presented: 2 (fair) Candidate for best paper award: 2 (Yes) Suitable to be accepted as a practitioner report as part of the industrial session: 1 (No) ----------- REVIEW ----------- Strong points: * Novel and interesting idea of combining latent factor rating prediction and hidden topic discovery using LDA * Usefulness of the proposed method illustrated for different tasks (rating prediction, cold-start, genre discovery, etc.) and on several *large* benchmarks datasets * Significant improvements obtained for rating prediction, especially when few ratings are available * Easy to follow * Technically sound --- Weak points: * The number of factors and hidden topics has to be the same. The optimal number of factors is typically greater than what is used in this paper (e.g., K=5), and perhaps greater than the useful number of hidden topics. The impact of using more factors is unclear... * The proposed method seems computationally expensive (L-BFGS) and could potentially find a low quality solution (local minima). Runtimes should be provided for the proposed method and baselines. ----------------------- REVIEW 3 --------------------- PAPER: 59 TITLE: Hidden Factors and Hidden Topics: Understanding Rating Dimensions with Review Text AUTHORS: Julian McAuley and Jure Leskovec OVERALL EVALUATION: 0 (borderline paper) REVIEWER'S CONFIDENCE: 3 (medium) Reviewer's Confidence: 3 (high) Relevance to RecSys: 3 (fair) Novelty: 3 (fair) Technical Quality: 3 (fair) Significance: 3 (fair) Presentation and Readability: 3 (fair) Reproduce the results presented: 2 (fair) Candidate for best paper award: 1 (No) Suitable to be accepted as a practitioner report as part of the industrial session: 1 (No) ----------- REVIEW ----------- NULL ------------------------- METAREVIEW ------------------------ PAPER: 59 TITLE: Hidden Factors and Hidden Topics: Understanding Rating Dimensions with Review Text RECOMMENDATION: accept This submission proposes a method exploiting the text of reviews, which are typically ignored, to improve rating prediction / item recommendation. The problem is important and the solution novel. The experimental evaluation on several large datasets is compelling. Last but not least, the paper is well-written.