The review report from reviewer #1: *1: Is the paper relevant to ICDM? [_] No [X] Yes *2: How innovative is the paper? [_] 5 (Very innovative) [_] 4 (Innovative) [X] 3 (Marginally) [_] 2 (Not very much) [_] 1 (Not) [_] 0 (Not at all) *3: How would you rate the technical quality of the paper? [_] 5 (Very high) [_] 4 (High) [X] 3 (Good) [_] 2 (Needs improvement) [_] 1 (Low) [_] 0 (Very low) *4: How is the presentation? [_] 5 (Excellent) [X] 4 (Good) [_] 3 (Above average) [_] 2 (Below average) [_] 1 (Fair) [_] 0 (Poor) *5: Is the paper of interest to ICDM users and practitioners? [X] 3 (Yes) [_] 2 (May be) [_] 1 (No) [_] 0 (Not applicable) *6: What is your confidence in your review of this paper? [_] 2 (High) [X] 1 (Medium) [_] 0 (Low) *7: Overall recommendation [_] 5 (Strong Accept: top quality) [_] 4 (Accept: a regular paper) [X] 3 (Weak Accept: a short paper) [_] 2 (Weak Reject: don't like it, but won't argue to reject it) [_] 1 (Reject: will argue to reject it) [_] 0 (Strong Reject: hopeless) *8: Summary of the paper's main contribution and impact This paper presents models for rating systems. The reviews are considered to have multiple aspects. Three prediction tasks are considered: 1. To predict which parts of a review refers to what aspect. 2. Summarize reviews on a per user basis to explain why a user gave a certain rating and 3. Rating prediction. Experiments are presented mainly on a beer dataset that has several aspects of beer in reviews. Three models are considered - unsupervised, semi-supervised and supervised respectively. The models themselves are not sophisticated or novel and they are pretty straightforward adaptations of existing methods. The experiments are reasonably thorough and show reasonable performance compared to other models in literature. The aspect modelling itself is pretty interesting and it will be useful to see how these carry over to the other domains (outside of beer data). *9: Justification of your recommendation I would like to see more of the following to consider this a strong contribution: 1. More general experiments on other domains to make sure that the models are easy to develop on other domains. 2. A more weaker set of assumptions on reviews that will more closely model real world. 3. More scalability comparisons (Currently the authors have not tried to optimize existing methods to make them more scalable which by itself would be a very good contribution). *10: Three strong points of this paper (please number each point) 1. Modeling aspects and associating parts of reviews with each aspect is important. 2. The models are natural and simple. 3. Experiments demonstrate reasonable success at least in the beer dataset which they discuss throughout the paper. *11: Three weak points of this paper (please number each point) 1. Some assumptions made by the authors are not practical. They insist on each review discussing ALL the aspects. In practice, a sentence may discuss 0 or more aspects and not exactly one. 2. A review as a whole may not discuss ALL the aspects. They may only discuss taste or only discuss feel. 3. The primary dataset on which evaluations are presented is the beer dataset. The problem is much more general than that. Given that the models are relatively simple, more emphasis needs to be paid to the experiments section - at least two to three different domains would make a more compelling case for the models. *12: Detailed comments for the authors The paper is reasonably well presented. The models are simple and straightforward. I have some comments on some of the material in the paper which I raise below. 1. a. In practice, a sentence in a review may talk about 0 or more aspects. b. A review need not cover ALL the aspects. In both scenarios above, there is information from which you can learn. However, you make a sticter assumption that the sentence will only discuss ONE aspect and that a review will cover ALL aspects. How practical is this in the real world? 2. In many instances, there are TOO many reviews for a product (popularity good or bad). Does it make sense to do a per-user summary in such a case? Users may only be interested in a collective and authoritative summary of the product rather than individual reviews due to lack of time. 3. It is not evident from directly reading the paper as to why your methods are more scalable than those in the literature. ======================================================== The review report from reviewer #2: *1: Is the paper relevant to ICDM? [_] No [X] Yes *2: How innovative is the paper? [_] 5 (Very innovative) [_] 4 (Innovative) [X] 3 (Marginally) [_] 2 (Not very much) [_] 1 (Not) [_] 0 (Not at all) *3: How would you rate the technical quality of the paper? [_] 5 (Very high) [_] 4 (High) [X] 3 (Good) [_] 2 (Needs improvement) [_] 1 (Low) [_] 0 (Very low) *4: How is the presentation? [_] 5 (Excellent) [_] 4 (Good) [X] 3 (Above average) [_] 2 (Below average) [_] 1 (Fair) [_] 0 (Poor) *5: Is the paper of interest to ICDM users and practitioners? [X] 3 (Yes) [_] 2 (May be) [_] 1 (No) [_] 0 (Not applicable) *6: What is your confidence in your review of this paper? [X] 2 (High) [_] 1 (Medium) [_] 0 (Low) *7: Overall recommendation [_] 5 (Strong Accept: top quality) [_] 4 (Accept: a regular paper) [X] 3 (Weak Accept: a short paper) [_] 2 (Weak Reject: don't like it, but won't argue to reject it) [_] 1 (Reject: will argue to reject it) [_] 0 (Strong Reject: hopeless) *8: Summary of the paper's main contribution and impact The paper presents scalable methods for modeling aspects in reviews. Since there is a plenty of reviews on the Web and semi/unsupervised models can easily benefit from them, scalability of aspect models is an important issue that has not been studied well. *9: Justification of your recommendation There are little discussions and experimental results on scalability: it is not explained the reason that the previous methods are not scalable, and what is the technical innovation that makes the proposed model more scalabe; no experimental results are demonstrated in terms of scalability. So the significance and effectiveness of this work is not clear, while the problem the paper tackled would be important. *10: Three strong points of this paper (please number each point) S1: The paper is overall clear and well-organized. *11: Three weak points of this paper (please number each point) W1: Scalability of the method is not discussed enough. W2: No experimental results are demonstrated in terms of scalability. W3: The model can be described in a more principled manner. *12: Detailed comments for the authors The paper presents scalable methods for modeling aspects in reviews. Since there is a plenty of reviews on the Web and semi/unsupervised models can easily benefit from them, building scalabile models is an important research topic that has not been studied well. There are little discussions and experimental results on scalability: it is not explained the reason that the previous methods are not scalable, and what is the technical innovation that makes the proposed model more scalabe; no experimental results are demonstrated in terms of scalability. So the significance and effectiveness of this work is still unclear. Even if existing models are not directly applicable to large scale data, it would be possible to first learn parameters from a relatively small amount of labeled data, and then predict aspects in unlabeled test data while fixing the paramters. The accuracy of such methods would provide better baseline. The model can be described in a more principled manner. Both unsupervised and supervised settings can be formulated as optimizing the same objective such as the hinge loss. Several notations are confusing: a sentence is referred to as $s$ or $r_{is}$ in different positions. The authors claimed that baseline LDA is categorized into semi-supervised approach (section VI-A). But, since it just uses lables for aligning topics and aspects, the process is very different from ordinary semi-supervised setting. Brody&Elhadad's paper is from NAACL not ACL. ======================================================== The review report from reviewer #3: *1: Is the paper relevant to ICDM? [_] No [X] Yes *2: How innovative is the paper? [_] 5 (Very innovative) [X] 4 (Innovative) [_] 3 (Marginally) [_] 2 (Not very much) [_] 1 (Not) [_] 0 (Not at all) *3: How would you rate the technical quality of the paper? [_] 5 (Very high) [_] 4 (High) [X] 3 (Good) [_] 2 (Needs improvement) [_] 1 (Low) [_] 0 (Very low) *4: How is the presentation? [_] 5 (Excellent) [X] 4 (Good) [_] 3 (Above average) [_] 2 (Below average) [_] 1 (Fair) [_] 0 (Poor) *5: Is the paper of interest to ICDM users and practitioners? [_] 3 (Yes) [X] 2 (May be) [_] 1 (No) [_] 0 (Not applicable) *6: What is your confidence in your review of this paper? [_] 2 (High) [X] 1 (Medium) [_] 0 (Low) *7: Overall recommendation [_] 5 (Strong Accept: top quality) [X] 4 (Accept: a regular paper) [_] 3 (Weak Accept: a short paper) [_] 2 (Weak Reject: don't like it, but won't argue to reject it) [_] 1 (Reject: will argue to reject it) [_] 0 (Strong Reject: hopeless) *8: Summary of the paper's main contribution and impact This paper proposes a model for analyzing review data which contain multiple aspects and ratings. The core idea is to model aspects and ratings as a function of the words appearing in the reviews. The paper proposes three different learning methods. The proposed model can be used for different applications, such as determining which parts of a review correspond to each rated aspect, finding sentences best summarize a review, and recovering missing ratings. *9: Justification of your recommendation In general, the paper is well-written. The proposed model seems to be sound and the technical depth of the paper is strong. The experimental results reported in the paper verify the effectiveness of the proposed model. My concern of this paper is that it does not provide any scalability results. Some assumptions made in the paper may not be always true in real-life scenarios. *10: Three strong points of this paper (please number each point) S1. The paper proposes a probabilistic model which models aspects and ratings on aspects as a function of the words appearing in the reviews. The model seems to be sound and the technical depth is strong. S2. The paper develops three different learning strategies, including unsupervised learning, semi-supervised learning and supervised learning. S3. Several large-scale real review data sets are adopted for evaluation. The experimental results using the proposed model outperform some other existing methods. *11: Three weak points of this paper (please number each point) W1. The paper mentions that the proposed method can be scaled to large-scale data sets. However, there is no results on scalability. W2. The proposed model implicitly assumes that every aspect in the review data must be supported by at least one sentence. This may not be true in many real-life scenarios. *12: Detailed comments for the authors The paper proposes a probabilistic model which models aspects and ratings on aspects as a function of the words appearing in the reviews. The model seems to be sound and the technical depth is strong. The experimental evaluation is good. In general, this paper is well-written and easy to follow. My major concern of the paper is that it does not provide any scalability results. Although the paper mentions that the proposed method can be scaled well and the experimental results conducted on large-scale data set are better than existing methods, it would be still desired to provide some results on scalability testing. A simple strategy is to use different percentage of the data set and examine the performance. My another concern is that some assumptions made in the paper may not be always true. For example, the proposed model implicitly assumes that every aspect in the review data must be supported by at least one sentence. However, it is quite often that the user reviews may not cover all the listed aspects. If such situation frequently exists in the review data, the performance of the proposed model may be degraded. Could you please provide some comments on that?