Paper ID4419 Paper Title: Complete the Look: Scene-based Complementary Product Recommendation META-REVIEWER #1 META-REVIEW QUESTIONS 3. Decision summary Accept: The paper received mixed or borderline reviews. The area chairs considered the paper, rebuttal, and reviewer comments, and decided to accept the paper. This decision has been confirmed by the AC panel. See comments below for details. 4. AC comments on decision The reviews are quite mixed on this paper. R2, the most negative reviewer, was concerned about the lack of comparison with other methods. The rebuttal addresses this point rather convincingly, but R2 did not mention the rebuttal or respond to it directly in his final assessment. R2 was confused by Fig. 7, which the authors will address, but this confusion led to a misunderstanding of what is available at test time, a critical element. R3's review was positive, but very terse and lacking any substantive detail or rationale. The interesting extension to the work on fashion and the generally high quality of the paper are the main factors for its acceptance. Reviewer #1 Questions 1. Summary. In 3-5 sentences, describe the key ideas and experiments and their significance. The key idea is given an outfit image with the area belonging to one of the item cropped out, find compatible items with the scene and the rest of the outfit. This cropping idea allows creates training data on real-word fashion outfit datasets that provide bounding boxes for the items. The compatibility is modeled as a deep similarity learning problem trained with triplets. 2. What aspects of the paper are particularly good? High practical impact Innovative problem formulation or solution Clear explanations and illustrations 3. Strengths. Consider the significance of key ideas, experimental validation, writing quality. Explain clearly why these aspects of the paper are valuable. The paper is clear, motivates the problem well. Works on daily selfie-like images unlike previous product to product image compatibility learning. Simple idea to generate training data for compatibility learning on scene images gives practical value. Promising results with two domains of fashion and home decor. Exploring attention in compatibility learning. 5. Weaknesses. Consider significance of key ideas, experiments, writing quality. Clearly explain why these are weak aspects of the paper, e.g. why a specific prior work has already demonstrated the key contributions, or why the experiments are insufficient to validate the claims. -It seems the datasets are proprietary (specifically Fashion-2 and Home datasets). Are there any plans to release the data for reproducibility in future research? - Some more recent state of the art approaches should be tested: Learning Type-Aware Embeddings for Fashion Compatibility, 2018 Learning Fashion Compatibility with Bidirectional LSTMs, 2017 - The saliency maps in 5.4. do not seem conclusive. (except for ignoring the face more compared to deep saliency approach). It would be interesting to reverse the qualittative results in figure 7, to see what "scenes" match a "product" to find out if the compatibility is based on similar backgrounds (outdoor, indoor, etc) or the outfit? (similarly for rooms/furniture). 6. Paper rating (pre-rebuttal) Weak Accept 7. Justification of rating. What are the most important factors in your rating? The paper presents a practical way to train compatibility networks with promising results on fashion and home decor images in real-life which has practical values for industry. Reviewer #2 Questions 1. Summary. In 3-5 sentences, describe the key ideas and experiments and their significance. This paper introduced an approach for learn the compatibility between clothing items from a real world images. They used a scene embedding and product embedding to features the image and items appears in the images and used a global and local compatibility measure (L2 distance) to learn the compatibilities. Their final compatibility function is combination of local and global distances. They used positive and negative training triples to learn the compatibilities. They introduces Complete The Look data set which is a modified version of Shop The Look data set. 3. Strengths. Consider the significance of key ideas, experimental validation, writing quality. Explain clearly why these aspects of the paper are valuable. Introducing a new approach that extends compatibility measurement of different products to real-world images. In real world images, products could be occluded, images could be in low quality and some proportion of products to the whole image could be very small. Hence, heir cropping method could recover fine details for a better compatibility measurement. 4. What aspects of the paper most need improvement? Key ideas and techniques of the paper are difficult to understand Contributions are not clearly and accurately stated Experiments are insufficient to validate the contributions 5. Weaknesses. Consider significance of key ideas, experiments, writing quality. Clearly explain why these are weak aspects of the paper, e.g. why a specific prior work has already demonstrated the key contributions, or why the experiments are insufficient to validate the claims. Al thought the paper introduced a novel approach for compatibility measurement on real-world images, the paper is not written clearly. The paper did not compare their work with any of the states of the art papers, such as references[23, 41,35] of the paper. In Figure 5 they claimed that their method could retrieve product images better than several referenced methods. As their model is using product image as input. Hence the product image is already seen by the model and this comparison is not fair. In general, I find section 3 Data set and section 5.3 very hard to read. The results are not clear and the test scenario is not defined properly. Figures are of low quality and annotation are missing in the images. Line 76 the authors refer to figure2 with Ip and Is however this annotation is not mentioned in the images. inline 404 they introduced function g but the function was never defined in the paper. What type of embedding is been used? 6. Paper rating (pre-rebuttal) Weak Reject 7. Justification of rating. What are the most important factors in your rating? The evaluation set is not defined clearly in the paper. The input to their system at the test time is not clearly defined. It is not clear how they annotated the data set to have a meaning of compatibility(line 63). How this annotation has been done? How workes measure the similarity between styles and its compatibility with the scene? It is hard to judge the qualitative results as I am not sure what is the input to their model at the test time. 11. Final recommendation based on ALL the reviews, rebuttal, and discussion (post-rebuttal) Reject 12. Recommendation confidence Very confident 13. Final justification The paper is not well written. The result are not compared with state of the art compatibility methods. Based on how annotation is done, the authors learnt retrieval rather than compatibility. Reviewer #3 Questions 1. Summary. In 3-5 sentences, describe the key ideas and experiments and their significance. This paper proposed a complementary product recommendation system, called complete the look. The system takes cropped street fashion image and a product category to recommend, and outputs the recommended products that are compatible with the given image. 2. What aspects of the paper are particularly good? Innovative problem formulation or solution Contributions clearly stated and validated Clear explanations and illustrations 3. Strengths. Consider the significance of key ideas, experimental validation, writing quality. Explain clearly why these aspects of the paper are valuable. 1. It's an interesting problem to recommend product items based on street photos. 2. The pipeline to create the dataset is novel. 3. The methods to solve the problem, based on global and local appearance compatibility, make sense. 4. What aspects of the paper most need improvement? Experiments are insufficient to validate the contributions 5. Weaknesses. Consider significance of key ideas, experiments, writing quality. Clearly explain why these are weak aspects of the paper, e.g. why a specific prior work has already demonstrated the key contributions, or why the experiments are insufficient to validate the claims. Currently, the results are on the dataset that are curated from the STL dataset. It could be better to show results/examples on real world images. 6. Paper rating (pre-rebuttal) Weak Accept 7. Justification of rating. What are the most important factors in your rating? Overall, this paper proposed a novel problem and dataset, and the proposed method works significantly better overall baselines. 9. Comments to author. Include any comments that may be useful for revision but should not be considered in the paper decision. 1. would the model be sensitive to how the image was cropped? 2. would the dataset be released? 3. From (3), the attention map depends on the cropped image, "c" and "p". In Figure 6, what are the "c" and "p" for each row for the "A" column? 11. Final recommendation based on ALL the reviews, rebuttal, and discussion (post-rebuttal) Accept 12. Recommendation confidence Somewhat confident