----------------------- REVIEW 1 --------------------- PAPER: 599 TITLE: Addressing Complex and Subjective Product-Related Queries with Customer Reviews AUTHORS: Julian Mcauley and Alex Yang OVERALL EVALUATION: 2 (accept) ----------- REVIEW ----------- In this work, the authors present a method for answering queries about product attributes based on customer reviews. The authors use a MoE approach that combines several well known classifiers as features, and evaluate their system on yes/no and open ended questions using data from Amazon. The authors system shows significant improvement versus the individual classifiers; the authors also subjectively evaluate their system using turkers. I found this paper to be very easy to read. The prose is clear, well organized, and well motivated. The evaluation is very thorough, and the discussion section is appreciated. Specific Comments: It's unfortunate that the authors call their system Square, given the name collision with other systems. The last sentence of 4.2 has grammar issues. 5.5: The authors state that turkers were presented with the top three ranked results from two methods. However, the two methods may predict overlapping results. In this case, is the turker shown less than six results (not counting the random result), or are lower ranked results pulled in? Furthermore, in cases where the top three results have overlaps, how is performance evaluated (i.e. both methods were "correct", so which method is awarded victory)? ----------------------- REVIEW 2 --------------------- PAPER: 599 TITLE: Addressing Complex and Subjective Product-Related Queries with Customer Reviews AUTHORS: Julian Mcauley and Alex Yang OVERALL EVALUATION: 1 (weak accept) ----------- REVIEW ----------- The paper introduces a system to retrieve reviews that are relevant to queries in Q&A systems such as the one available on Amazon. I think the paper tackles and interesting, challenging, and relevant problem. The authors do a good job at motivating the problem, they fully describe their system with technical details in an effective way, and their results are reasonably strong. In my view, the paper makes two important contributions. First, the idea of using reviews to answer queries on demand is a novel idea and implementing an effective systems that can achieve good results could be very useful in the context of Amazon as well as many other ones. Second, they authors develop such system, Square, and test its effectiveness using both machine-labeled and human-labeled data. The paper does have some weaknesses. First, while Square performs better than other methods on the machine-labeled data, the improvement does not seem to be very large (see figure 2 and table 3). The authors claim that the improvement is substantial, but they do not explain why an improvement of 4%, on average, is significant enough. It seems to me like it may not be. The improvement of Square on human-labeled data seems to be much more significant, which is promising. Second, the way they validate the open-ended questions may be problematic. Identifying a “true” answer in a pool of candidates does not really test whether a relevant answer can be identified when the true answer is not in the pool. In particular, the “true” answer is likely to share many linguistic attributes with the question. This is probably not the case with reviews that were not directly answering the given question. ----------------------- REVIEW 3 --------------------- PAPER: 599 TITLE: Addressing Complex and Subjective Product-Related Queries with Customer Reviews AUTHORS: Julian Mcauley and Alex Yang OVERALL EVALUATION: 0 (borderline paper) ----------- REVIEW ----------- This is a well done empirical study on amazon reviews and Q\A Authors attempt to combine Q\a and reviews to answer questions by directly leveraging the reviews. The work is based off of a relatively simple mixture expert model on a large amazon dataset. The work has definitely some merits: - large dataset - good explanation - lots of sound experiments with good baseline Some of the main issues: - What is the actual novelty here? Despite the state of the work and the discussion on how this work is different, I still see this as a conventional opinion mining work - How was the data labeled? the section is no clear. Who are the labelers ? what was labeled exactly? - The technical contribution is overall small. Mostly, the added value of the paper is on the large experiment sets. - The problem seems somewhat of little importance, or very straightforward, despite the claims made in the paper. ------------------------- METAREVIEW ------------------------ There is no metareview for this paper