----------------------- REVIEW 1 --------------------- PAPER: 1117 TITLE: From Amateurs to Connoisseurs: Modeling the Evolution of User Expertise through Online Reviews AUTHORS: Julian McAuley and Jure Leskovec OVERALL EVALUATION: 4 (weak accept) Contributions: 3 (Novel) Technical merits: 3 (Strong) Overall presentation: 2 (Good) Recommend as a poster?: 1 (Yes) ----------- REVIEW ----------- This paper proposes to improve recommendation systems by modelling user expertise as a changing hidden attribute of users. It is very well presented, novel, and shows large improvements on real data. However, some of the claims are a little strong: The statement that all of the models in Figure 2 have the same number of parameters isn't quite true (section 2). That is true if you only consider "half" the model, i.e. the five recommendation models. However, the parameters that determine when expertise changes for a given user are part of your actual model, with model (d) having many more actual parameters. Also, in Section 5, you observe that experts are most predictable as they have the lowest MSE. But the users with expertise level 4 usually have the HIGHEST MSE. So there isn't a trend here. To better answer the question if experts just use extreme ratings, while non-experts don't, it would also be great to see the distribution of ratings given by each level of expertise directly - as not just the variance. One other minor question: What are the products on the dotted line in Figure 1? Are they in any way special, since experts and non-experts agree on them. ----------------------- REVIEW 2 --------------------- PAPER: 1117 TITLE: From Amateurs to Connoisseurs: Modeling the Evolution of User Expertise through Online Reviews AUTHORS: Julian McAuley and Jure Leskovec OVERALL EVALUATION: 4 (weak accept) Contributions: 2 (Incremental) Technical merits: 2 (Good) Overall presentation: 2 (Good) Recommend as a poster?: 2 (No) ----------- REVIEW ----------- Paper studies the evolution in people’s tastes over time as their experience level changes. The authors do so by mining the online reviews of users. This is an interesting paper and I feel generally positively about the work. However, there are some concerns: Inadequate motivation: I’m not sure that we need to model change to measure the relevance of a product to a user at a particular time. Recommendations are typically made at the point of consumption. It’s therefore not clear to me why this is an important problem – either it’s not or the authors need to do a better job in articulating why modeling the _change_ itself is important. I do agree that it may be reasonable to weight the ratings provided for an item (movie, book, etc.) temporally so that something that is popular with the current user 10 years ago or something that is popular with the community 10 years ago receives less emphasis now, but that doesn’t seem like it’s exactly what the authors are getting at here, and a simple temporal decay function could do the job there anyway. Unclear motivating example: How are experts identified in the motivating example. Since it plays such a central role in the analysis, it seems worth defining that clearly in the paper. The plot also says nothing about how tastes change over time – changes occur within a single user. It might be that people who are defined as experts (in whatever way that happens) just have different tastes for some other reason – what is really needed to make the point you are trying to make is evidence that within a particular user, their tastes change over time – comparing two populations doesn’t provide the same insight into expertise dynamics. Other factors explaining differences: For example, the lower variance in the ratings for experts may simply be because experts are similar in other ways beyond their expertise. This may be particularly problematic because experts are identified based on the latent parameter e_ui, which is learned by the model, and the authors do not report any attributes of the users at different values of e_ui beyond the MSE and the time to reach the level (which seems like it would be correlated by definition). This should be further investigation into the nature of the differences between users of different experience levels to understand if there are variables that could better explain the low variance than simply being more expert. Missing related work: There has been other work on expertise dynamics although not in the same context. For example, White et al. WSDM 2009 considered changes in people’s expertise over time as measured by the technical content in their queries. The authors should consider that, as well as other work in the information science community that has considered novices and experts, differences in their behavior, and how novices become experts. It also seems that there should be a fair amount of relevant related work in the education and psychology literature on how people become more expert over time that seems to be largely ignored here. Missing statistical testing: Although the standard error values in the table are low (and the differences are likely statistically significant) it would be worth verifying this with statistical tests. Also, consider reporting effect size given the large N. Overall, this is an interesting paper that models the effect of expertise differences on ratings. I do have some concerns about the nature of the research presented, but I feel that these could be addressed by the authors. ----------------------- REVIEW 3 --------------------- PAPER: 1117 TITLE: From Amateurs to Connoisseurs: Modeling the Evolution of User Expertise through Online Reviews AUTHORS: Julian McAuley and Jure Leskovec OVERALL EVALUATION: 5 (strong accept) Contributions: 3 (Novel) Technical merits: 3 (Strong) Overall presentation: 3 (Excellent) Recommend as a poster?: 2 (No) ----------- REVIEW ----------- This paper carries out a careful investigation into the development and evolution of users' abilities to discern product/service quality, through examination of the artefacts of users' online ratings. It was delightful to read that not only did the authors validate their findings across multiple datasets, but that they also propose to provide the datasets they developed (crawled from public sources) as part of their publication. Important research findings are outlined by the authors, regarding characteristics of user expertise (such as users more likely to abandon a community do fewer reviews, and take longer to reach the same expertise levels). The authors compare different methods of modeling evolution in expertise, including evolution at uniform intervals by user and by community, and also at learned intervals for the same groupings. Their findings suggest that modeling individual user evolution in expertise is the most successful, and that not more than 5 levels of expertise are needed to discriminate between expertise levels from a pragmatic perspective. The authors describe their latent factor modeling approach very clearly, and give indications for the likely computational costs in training the models. Several insights are provided through the Qualitative Analysis section, and I was particularly intrigued by the observations on acquired tastes as indicators for developing expertise. The authors raised interesting questions for future work. I wonder whether there is an analogy with the ML curriculum learning techniques to the questions about recommending sequences of products for helping people become experts. Some very minor suggestions to improve the paper include: - Section 2 - Model Specification, "... Note that is is not" -> "Note that it is not". - Section 4 - Evaluation, second last para. "... at all experience levels (as in Table 2) rather than considering users' most recent reviews (as in Table 3)". I think these table references are around the wrong way, given the titles on these respective tables (Table 2 is Results on users' most recent reviews". - Figure 5 - it wasn't obvious to me why the x-axis on these graphs is not 0-based (even though the first expertise level is 1). That is, the x-axis implies that there are 4 intervals (1-2, 2-3, 3-4, 4-5), but I was expecting to see 5 intervals, one for each of the expertise levels. - Figure 5 - (or commentary in corresponding section on Experience Progression - it would have been interesting to hear your thoughts on why there is an emerging increase in variance in the Amazon Foods and Amazon Movies right in the last part of the experience level development. These are small bumps to be sure, but still a little surprising. Do experts on Amazon get more random and if so why? Overall, the paper was a pleasure to read: clear, well motivated, generalizable, and soundly carried out. I came away feeling I'd learned about several very interesting new observations about the nature of user modeling for online review systems (and perhaps more generally about development of human expertise more generally).