#################################################################################################### SIGIR (accepted) #################################################################################################### ------------- Review from Reviewer 1 ------------- Relevance to SIGIR (1-5, accept threshold=3) : 4 Originality of Work (1-5, accept threshold=3) : 4 Technical Soundness (1-5, accept threshold=3) : 4 Quality of Presentation (1-5, accept threshold=3) : 4 Impact of Ideas or Results (1-5, accept threshold=3) : 4 Adequacy of Citations (1-5, accept threshold=3) : 3 Reproducibility of Methods (1-5, accept threshold=3) : 5 Overall Recommendation (1-6) : 4 -- Comments to the author(s): This is the meta-review by the primary program committee member responsible for your paper, and takes into account the opinions expressed by the referees, the subsequent decision thread, and my own opinions about your work. This is a very original contribution, using a clever method to evaluate recommendations based on styles and substitutes on a large dataset. The main contribution of the paper is that it demonstrates that features of images, and their usage, are clearly useful for recommendation. This is a novel contribution that will raise discussion at the SIGIR conference, and undoubtly some follow-up work. The referees raised some concerns about the techniques used, which, even as a baseline, seem not that strong/sophisticated. We advise the authors to include a discussion on the fact that they are measuring constructs indirectly, which raises questions about what is really being measured, i.e., whether they are really measuring/predicting appropriate substitutions and complementary styles, and whether the proposed approach will actually improve recommendation performance . We applaud the authors' promise to make the dataset and code of their experiment available to other researchers. Based on the analysis above, we advise the SIGIR program committee meeting to accept the paper, if there is room in the conference program (weak accept). -- Summary: Very original contribution, using a clever method to evaluate recommendations. Baseline approaches seem not that strong, and a discussion on what the authors are really measuring (Are really measuring/predicting appropriate substitutions and complementary styles?) is missing. We recommend the SIGIR PC meeting to accept the paper, if there is room in the program. ---------- End of Review from Reviewer 1 ---------- ------------- Review from Reviewer 2 ------------- Relevance to SIGIR (1-5, accept threshold=3) : 4 Originality of Work (1-5, accept threshold=3) : 4 Technical Soundness (1-5, accept threshold=3) : 4 Quality of Presentation (1-5, accept threshold=3) : 4 Impact of Ideas or Results (1-5, accept threshold=3) : 3 Adequacy of Citations (1-5, accept threshold=3) : 4 Reproducibility of Methods (1-5, accept threshold=3) : 4 Overall Recommendation (1-6) : 4 -- Comments to the author(s): As Secondary PCM I have reviewed this submission, the reviews as well as the discussion and I concur with the decision. -- Summary: As Secondary PCM I have reviewed this submission, the reviews as well as the discussion and I concur with the decision. ---------- End of Review from Reviewer 2 ---------- ------------- Review from Reviewer 3 ------------- Relevance to SIGIR (1-5, accept threshold=3) : 5 Originality of Work (1-5, accept threshold=3) : 4 Technical Soundness (1-5, accept threshold=3) : 4 Quality of Presentation (1-5, accept threshold=3) : 4 Impact of Ideas or Results (1-5, accept threshold=3) : 4 Adequacy of Citations (1-5, accept threshold=3) : 4 Reproducibility of Methods (1-5, accept threshold=3) : 4 Overall Recommendation (1-6) : 4 -- Comments to the author(s): This paper studies how to make recommendations based on product image features. It proposes to use logistic regression with learned distance functions to predict the relationships between products. Two distance functions (weighted nearest neighbor and Mahalanobis transformation) are considered. If we consider image features as meta features about products, the proposed method (Mahalanobis transformation) is very related or similar to the bi-linear model for recommendation proposed before. Although used differently, the author should cite that work, which can also be used for this task: Personalized Recommendation on Dynamic Content Using Predictive Bilinear Models. The evaluation on link (i.e. product relationship) prediction looks good. The evaluation on personalized recommendation is weak, no real recommendation evaluation metric is used and it is not compared with any baseline recommendation methods. It's good the authors are willing to share the code and data if accepted. It will be better if this paper can include WNN with bag of words (words from product description/review/meta data, instead of just category tree) as a stronger baseline. Since WNN with category trees is used as baseline, and WNN with image features is described as an option for distance learning for the proposed approach, and WNN is used in several places without specifying which one the author is referring to. It's better to give them different names. -- Summary: The problem is important and the proposed solution seems descent. The evaluation on Amazon item-item relationship prediction is good. The evaluation on how the proposed approach will improve recommendation performance is weak (almost missing). ---------- End of Review from Reviewer 3 ---------- ------------- Review from Reviewer 4 ------------- Relevance to SIGIR (1-5, accept threshold=3) : 4 Originality of Work (1-5, accept threshold=3) : 4 Technical Soundness (1-5, accept threshold=3) : 2 Quality of Presentation (1-5, accept threshold=3) : 4 Impact of Ideas or Results (1-5, accept threshold=3) : 3 Adequacy of Citations (1-5, accept threshold=3) : 3 Reproducibility of Methods (1-5, accept threshold=3) : 4 Overall Recommendation (1-6) : 3 -- Comments to the author(s): The authors build a recommender system for similar and complementary items based on images. There are some clever things about this paper: the general idea of substitutes and complementary items, the use of a large scale dataset, and the underlying model. However it is not entirely convincing. One problem, which the authors acknowledge, is that the Amazon data does not directly measure substitutes/complementarity, so that the results may be based on image similarities between categories, or some other factor(s), rather than underlying relationships. Another problem is that style and complementarity are highly personal and presumably also time-dependent, as things move in and out of fashion. Based on the examples shown, the relationships seem rather shallow. Some of the analyses, while interesting, are more anecdotal than convincing. The paper is well-written, with few grammatical problems, and easy to read. It is liberally illustrated with visual examples, though these are mostly too small to make the point the authors are trying to make. The authors state that they will share the dataset and coding, ensuring reproducibility -- Summary: A clever and entertaining paper, but one that ultimately does not convince that it is measuring/predicting appropriate substitutions and complementary styles. ---------- End of Review from Reviewer 4 ---------- ------------- Review from Reviewer 5 ------------- Relevance to SIGIR (1-5, accept threshold=3) : 4 Originality of Work (1-5, accept threshold=3) : 2 Technical Soundness (1-5, accept threshold=3) : 4 Quality of Presentation (1-5, accept threshold=3) : 4 Impact of Ideas or Results (1-5, accept threshold=3) : 4 Adequacy of Citations (1-5, accept threshold=3) : 3 Reproducibility of Methods (1-5, accept threshold=3) : 3 Overall Recommendation (1-6) : 4 -- Comments to the author(s): The paper proposes a recommendation method for merchandise based on styles and substitutes. The method uses large dataset of 180 million relationships between 6 million objects representing Amazon's recommendations which will be released for academic use. The method converts visual feature (CNN feature obtained by Caffe pretrained with ILSVRC2010 data) with linear metric learning adapting to relationships by sigmoid loss minimization with L-BFGS. Thorough experiments including recommendation estimation, co-purchase prediction, generating recommendation, etc. The paper is very well written. Strengths. The paper is very well written and easy to follow. The dataset used in the paper is very challenging and may attract researchers interest. Weaknesses. The studied method, even as a baseline, is a bit too weak. The proposed method finally uses simplest form of linear metric learning, which can be fine as a baseline, but simplest which any people can consider as the first option. People may easily expect other baselines, for example, considering locality preserving technique, max-margin optimization, multitask learning framework taking into account within/between category relationships, etc. CNN features pretrained with ILSVRC may not be suitable for merchandise; at least domain adaptation should be considered. -- Summary: The paper poses challenging problem with interesting dataset, however, the studied baseline technique is too simple which is disappointing. ---------- End of Review from Reviewer 5 ---------- #################################################################################################### CVPR (withdrew during rebuttal phase) #################################################################################################### Review: 1 Review: Question 1. Paper and Review Summary. Briefly describe the contributions of the paper to computer vision. Include a concise, bulleted list of the paper's main strengths and a concise, bulleted list of the paper's main weaknesses. Please keep these brief. You will elaborate on the pros/cons in the subsequent text boxes below. The paper describes a dataset of 4 relationships between products, scraped from Amazon, and proposes an image-based product recommendation system. The dataset contains ~6M images and ~180M relationships. + Nice big dataset, though, I wonder if it can be legally shared. + Nice definition of the problem. - The performance metrics and experimental setup are not completely clear. 2. Paper Strengths. Please discuss the positive aspects of the paper. Be sure to comment on the paper's novelty, technical correctness, clarity and experimental evaluation. Notice that different papers may need different levels of evaluation: a theoretical paper may need no experiments, while a paper presenting a new approach to a known problem may require thorough comparisons to existing methods. Also, please make sure to justify your comments in great detail. For example, if you think the paper is novel, not only say so, but also explain in detail why you think this is the case. The dataset is quite comprehensive, and the paper is mostly well written. The problem is well-defined, and motivated by the real-world applications in shopping (on Amazon, for example), but, more fundamentally, by how humans match items together, or prefer particular items to others. A sensible linear model is proposed and executed. Also appreciated the clustering results. Additional refs: Large Scale Visual Recommendations From Street Fashion Images “Hi, Magic Closet, Tell Me What to Wear!” Getting the look: clothing recognition and segmentation for automatic product suggestions in everyday photos The "Magic Closet" paper already has a dataset called "What to Wear", so you may want to rename yours to avoid future confusion. 3. Paper Weaknesses. Please discuss the negative aspects of the paper: lack of novelty or clarity, technical errors, insufficient experimental evaluation, etc. Please justify your comments in great detail. If you think the paper is not novel, explain why and give a reference to prior work. Keep in mind that novelty can take a number of forms; a paper may be novel in terms of the method, the theory, analysis for an existing problem, or the empirical evaluation. If you think there is an error in the paper, explain in detail why it is an error. If you think the experimental evaluation is insufficient, remember that theoretical results/ideas are essential to CVPR and that a theoretical paper need not have experiments. It is *not* okay to reject a paper because it did not outperform other existing algorithms, especially if the theory is novel and interesting. It is also not reasonable to ask for comparisons with unpublished papers and papers published after the CVPR deadline. Experiment I'm not exactly sure what the numbers in tables 3 and 4 represent. Is the task to classify into the 4 categories { buy after, also viewed, also bought, bought together}, all of those images that have a relationship with the query image? Are there distractor images (images with no relationship)? It seems to me that the fundamental task is to retrieve for the query images, all of the images for each of the 4 categories, from among a pool of all the catalog items in the dataset. In summary: I found it confusing exactly how the accuracies were measured, and which images were analyzed. Please clarify. Clarity Is there is problem with eq (1) or (2)? It seems both can't be true... if Probability is proportional to negative distance, then it can't also be related by sigmoid, can it? Fig. 5-- I don't know who any of those people, or the names are. So the caption wasn't helpful for me. Perhaps add scores? Maybe also add names? Fig. 8-- How is it handled when only one clothing item is in the image (e.g. a dress)? Is it automatically fashionable? How do you compare one image with 2 clothing items (1 pair) to another with 3 clothing items (3 pairs), in a way that normalizes for the mismatch? Otherwise, it would seem to be biased for fewer clothing items. Dataset I'm concerned about a) whether the paper deserves credit for sharing the dataset, which is essentially Amazon's creation and property, b) whether scraping Amazon violates its terms, and c) whether sharing Amazon images and data violates its terms. Please comment on whether permission was obtained. From: http://www.amazon.com/gp/help/customer/display.html/ref=footer_cou?ie=UTF8&nodeId=508088 "This license does not include any resale or commercial use of any Amazon Service, or its contents; any collection and use of any product listings, descriptions, or prices; any derivative use of any Amazon Service or its contents; any downloading or copying of account information for the benefit of another merchant; or any use of data mining, robots, or similar data gathering and extraction tools." In my mind, this dataset, while very interesting, may overstep the bounds of what can be a gray area for researchers. For example, if Amazon wanted to share this data with researchers, they certainly could, similar to the Netflix challenge. 4. Preliminary Rating: This rating indicates to the area chair, to other reviewers, and to the authors, your current opinion on the paper. Please use 'Borderline' only if the author rebuttal and/or discussion might sway you in either direction. Borderline 5. Preliminary Evaluation. Please explain your current rating on the paper. This explanation may include how you weight the importance of the various strengths and weaknesses you described above in Q1-Q3. Note, after the rebuttal period, you will be asked to submit a response to the rebuttal and a final rating. Borderline. Overall, a well defined problem and approach. But, unclear evaluation and there is a possible issue on the dataset. 6. Rebuttal Requests: Make a list of items you would like the authors to be sure to address in their rebuttal. Comment on: 1) Experimental eval. 2) Reviewer concerns on dataset. 7. Confidence. Select: "Very Confident" to stress that you are absolutely sure about your conclusions (e.g., you are an expert who works in the paper's area), "Confident" to stress that you are mostly sure about your conclusions (e.g., you are not an expert but can distinguish good work from bad work in that area), and "Not Confident" to stress that that you feel some doubt about your conclusions. In the latter case, please provide details in your Preliminary Evaluation response. Confident Review: 2 Review: Question 1. Paper and Review Summary. Briefly describe the contributions of the paper to computer vision. Include a concise, bulleted list of the paper's main strengths and a concise, bulleted list of the paper's main weaknesses. Please keep these brief. You will elaborate on the pros/cons in the subsequent text boxes below. In this paper, the author proposed an approach to uncover human notions of the visual relationships between objects (items in Amazon for example). The main contribution to computer vision is leveraging visual features in a recommendation system to compute relationships between two objects. Pros: 1. The idea of combine visual features and matrix factorization together to solve content-based recommendation problems is somewhat novel and seems feasible. 2. The proposed visual distance measure not only measures style similarity, but also style similarity between complementary objects; this is useful in real applications. 3. The proposed approach seems to handle large-scale datasets and does not need a large amount of manual labeling. Cons: 1. The presentation of the method part is not clear enough. The authors should provide a more clear framework of how to compute the distance between two objects based on the deep visual features. 2. The problem formulation is also not clear. 3. For the recommendation part, this paper lacks quantitative evaluations. 2. Paper Strengths. Please discuss the positive aspects of the paper. Be sure to comment on the paper's novelty, technical correctness, clarity and experimental evaluation. Notice that different papers may need different levels of evaluation: a theoretical paper may need no experiments, while a paper presenting a new approach to a known problem may require thorough comparisons to existing methods. Also, please make sure to justify your comments in great detail. For example, if you think the paper is novel, not only say so, but also explain in detail why you think this is the case. 1. The combination of visual features and matrix factorization seems to work well on content-based recommendation system. 2. The experiment results indicate better performance of the proposed approach over the baselines. 3. Paper Weaknesses. Please discuss the negative aspects of the paper: lack of novelty or clarity, technical errors, insufficient experimental evaluation, etc. Please justify your comments in great detail. If you think the paper is not novel, explain why and give a reference to prior work. Keep in mind that novelty can take a number of forms; a paper may be novel in terms of the method, the theory, analysis for an existing problem, or the empirical evaluation. If you think there is an error in the paper, explain in detail why it is an error. If you think the experimental evaluation is insufficient, remember that theoretical results/ideas are essential to CVPR and that a theoretical paper need not have experiments. It is *not* okay to reject a paper because it did not outperform other existing algorithms, especially if the theory is novel and interesting. It is also not reasonable to ask for comparisons with unpublished papers and papers published after the CVPR deadline. 1. The author should provide a more clear framework of how to compute the distance between two objects based on deep visual features. 2. A table to describe the algorithm of measuring d(xi, xj) will be better for understanding. 3. It's unclear how to distinguish the sub-categories of an objects. It seems to use weighted nearest neighbor, but how to compute? 4. Generating recommendations is an important application of the approach. However, the dataset used in Fig.8 is relatively small, containing only 17 examples. With the large-scale WNW dataset, this paper lacks quantitative evaluations on recommendation part. 4. Preliminary Rating: This rating indicates to the area chair, to other reviewers, and to the authors, your current opinion on the paper. Please use 'Borderline' only if the author rebuttal and/or discussion might sway you in either direction. Weak Reject 5. Preliminary Evaluation. Please explain your current rating on the paper. This explanation may include how you weight the importance of the various strengths and weaknesses you described above in Q1-Q3. Note, after the rebuttal period, you will be asked to submit a response to the rebuttal and a final rating. The idea of this paper is somewhat novel and the application is good. However, the presentation of this paper is not clear in the method description part. The novelty is somewhat diminished by a fairly closely related work that also uses Amazon data. This work also considered both visual similarity and link similarity. Xin Jin, Jiebo Luo, Jie Yu, Gang Wang, Dhiraj Joshi, and Jiawei Han, “Reinforced Similarity Integration in Image-Rich Information Networks”, IEEE Transactions on Knowledge and Data Engineering, 25(2):448-460, 2013. 6. Rebuttal Requests: Make a list of items you would like the authors to be sure to address in their rebuttal. 1. provide a detailed description of the framework on how to computer d(xi, xj), how to differentiate, substitute or complement, how to determine sub-categories. 2. provide more quantitative evaluations on the recommendations in the experiment section. 7. Confidence. Select: "Very Confident" to stress that you are absolutely sure about your conclusions (e.g., you are an expert who works in the paper's area), "Confident" to stress that you are mostly sure about your conclusions (e.g., you are not an expert but can distinguish good work from bad work in that area), and "Not Confident" to stress that that you feel some doubt about your conclusions. In the latter case, please provide details in your Preliminary Evaluation response. Confident Review: Question 1. Paper and Review Summary. Briefly describe the contributions of the paper to computer vision. Include a concise, bulleted list of the paper's main strengths and a concise, bulleted list of the paper's main weaknesses. Please keep these brief. You will elaborate on the pros/cons in the subsequent text boxes below. The paper studies the influence of visual appearance in the shopping scenario, where we can observe roughly two kinds of functionality between products: substitute and complement relationships. Broadly, the goal in this paper is to study the visual influence on this product-pair relationships, and to utilize in the recommendation system. The proposed approach is to predict the product relationship using different metrics (1st order, 2nd order) to be learned from the big data. Experimental evaluation shows that proposed subspace approach outperforms baselines in prediction, and also the geometry within the subspace. Also, as an application of the method, the paper demonstrates the evaluation of outfits using TV shows. 2. Paper Strengths. Please discuss the positive aspects of the paper. Be sure to comment on the paper's novelty, technical correctness, clarity and experimental evaluation. Notice that different papers may need different levels of evaluation: a theoretical paper may need no experiments, while a paper presenting a new approach to a known problem may require thorough comparisons to existing methods. Also, please make sure to justify your comments in great detail. For example, if you think the paper is novel, not only say so, but also explain in detail why you think this is the case. * Challenging a difficult problem to learn subtle visual phenomenon. * Scale of the real-world data in study. 3. Paper Weaknesses. Please discuss the negative aspects of the paper: lack of novelty or clarity, technical errors, insufficient experimental evaluation, etc. Please justify your comments in great detail. If you think the paper is not novel, explain why and give a reference to prior work. Keep in mind that novelty can take a number of forms; a paper may be novel in terms of the method, the theory, analysis for an existing problem, or the empirical evaluation. If you think there is an error in the paper, explain in detail why it is an error. If you think the experimental evaluation is insufficient, remember that theoretical results/ideas are essential to CVPR and that a theoretical paper need not have experiments. It is *not* okay to reject a paper because it did not outperform other existing algorithms, especially if the theory is novel and interesting. It is also not reasonable to ask for comparisons with unpublished papers and papers published after the CVPR deadline. * Lack of study and discussion regarding other modality; e.g., how visual effect compares to semantic influence. * Rather limited technical novelty. 4. Preliminary Rating: This rating indicates to the area chair, to other reviewers, and to the authors, your current opinion on the paper. Please use 'Borderline' only if the author rebuttal and/or discussion might sway you in either direction. Borderline 5. Preliminary Evaluation. Please explain your current rating on the paper. This explanation may include how you weight the importance of the various strengths and weaknesses you described above in Q1-Q3. Note, after the rebuttal period, you will be asked to submit a response to the rebuttal and a final rating. I like that the paper tackles a very challenging problem of recommendation in the real-world setting. The dataset the paper promise to share with the community should be valuable to any researcher studying a similar problem. Also, I like the demonstration using TV show. There are some concerns in the paper. First, the paper applies the same model to all categories of items. However, I wonder if this strategy is appropriate for some categories (e.g., Visual influence on Books or Digital Music). Because of this, it is hard to make sense of the meaning of improvement for some categories in Table 3 and 4. Perhaps the best strategy of studying the product relationships here is first to have a baseline method using semantics (e.g., NLP-based approach), and see the relative strength of visual influence under each category. That way, perhaps we might see a reasonable justification to apply the proposed method specifically to visually-influenced categories like clothing. I would argue that the uniform-noise assumption in the paper is too rough for the analysis in this scenario. In other word, introducing additional analysis based on semantics will make this paper stronger and more convincing. As of now, my rating is borderline. The attempt and the data look nice, but the analysis could be improved. 6. Rebuttal Requests: Make a list of items you would like the authors to be sure to address in their rebuttal. In Section 4, what is the positive/negative ratio of data? Unbalanced distribution affects the interpretation of accracy in Table 3 and 4. Figure arrangement looks a bit cluttered; the order goes back and forth, Fig 8 misses a mention in the text, etc. 7. Confidence. Select: "Very Confident" to stress that you are absolutely sure about your conclusions (e.g., you are an expert who works in the paper's area), "Confident" to stress that you are mostly sure about your conclusions (e.g., you are not an expert but can distinguish good work from bad work in that area), and "Not Confident" to stress that that you feel some doubt about your conclusions. In the latter case, please provide details in your Preliminary Evaluation response. Confident