------------------------- METAREVIEW ------------------------ The authors present a detailed analysis of the sensitivity of recommender algorithms to different perturbation models. This is an understudied problem, especially in the context of recommender systems and advances our understanding of the techniques in general. The reviewers were mixed on the motivation and justification of the specific perturbation model and some of the analysis. The authors should address the comments from reviewers before the final submission. ----------------------- REVIEW 1 --------------------- SUBMISSION: 8774 TITLE: Rank List Sensitivity of Recommender Systems to Interaction Perturbations AUTHORS: Sejoon Oh, Berk Ustun, Julian McAuley and Srijan Kumar ----------- Overall evaluation ----------- SCORE: -1 (weak reject) ----- TEXT: The paper presents a methodology to test the sensitivity of a ranking system using perturbations to the input features. In particular, the paper focuses on perturbations arising from an interaction graph wherein the goal is adversarial – to remove the most important features from the training data that lead to model instability. The manuscripts provides some comparisons and experimental results against measuring stability using random perturbations using a proposed metric for rank list comparisons. Overall the paper lacks motivation and does not provide a firm takeaway. The results provided are not surprising, given the adversarial nature of perturbations, and the paper lacks empirical statistics about the metrics compared. ----------- Strengths and reasons to accept ----------- • The paper is generally well written and easy to follow. The authors motivate the need to study model stability, as well as highlight the impact of sequential features in model predictions for recommendations. ----------- Weaknesses and limitations ----------- • The main concern for the current draft is a lack of clear motivation. For example, it is unclear what the end goal of the paper is. Given the lack of theoretical comparisons and statistical sensitivity analysis, the proposed method cannot have statistical or empirical guarantees. Further, given the adversarial nature of choosing perturbations, the proposed method is bound to show more instability for models compared to random perturbations. Therefore, the claims of the paper, although true, is sort of expected by default. • The authors should provide more statistics about metrics in experimental results, e.g. Table 3. E.g. it would be helpful to show variances of each method, as well as statistical significance to assess whether the changes are indeed meaningful • A more common perturbation that occurs in sequential features is re-ordering. This, in fact is more common than the adversarial perturbation that the paper focuses on. It would be great if the authors consider this as an additional perturbation methodology, and compare with their proposed methods. ----------------------- REVIEW 2 --------------------- SUBMISSION: 8774 TITLE: Rank List Sensitivity of Recommender Systems to Interaction Perturbations AUTHORS: Sejoon Oh, Berk Ustun, Julian McAuley and Srijan Kumar ----------- Overall evaluation ----------- SCORE: 1 (weak accept) ----- TEXT: This paper proposes Rank List Sensitivity to quantify the stability of recommender systems, and then devises CASPER to identify perturbations to induce higher instability in a given recommender system. This paper is clearly written, it gives reasons for the importance of stability of the entire list rather than just the stability of the correct items, which is convicing and innovative. The definition of the new metric RLS is in line with the central idea of this paper and is easy to calculate. Meanwhile, CASPER seems effective in selecting perturbations which induce high instability to the system. 1. Although RLS can be calculated using two similarities, RBO and Top-K Jaccard similarity don't appear to be otherwise connected. There is no clear description of what scenario each of the two similarities should be used in. In each of the experiments in this paper, only one similarity was used to measure the impact of perturbations. If the two similarities reflect properties of different aspects of the recommender system, the current experimental results look incomplete. If, as in this paper, the use of one similarity per experiment is sufficient to achieve what the experiment is intended to verify, what were the criteria for selecting this current similarity in the text and not another? 2.This paper presents RLS and illustrates the poor stability of recommender systems, but does not suggest possible improvement plans for the nature of the problem reflected by RLS. It seems more appropriate for the authors to analyse in depth and detail what capabilities are missing in the recommender systems represented behind RLS that lead to instability, thus providing more insight for later research. ----------- Strengths and reasons to accept ----------- 1. This paper introduces Rank List Sensitivity to measure the stability of different recommender systems. The motivation is innovative compared to methods that only focus on accuracy. 2. The paper is persuasively presented and clearly written. 3. The method proposed in this paper is simple and easy to implement, and a detailed analysis of the complexity of the method is also presented. ----------- Weaknesses and limitations ----------- 1. Although RLS can be calculated using two similarities, RBO and Top-K Jaccard similarity don't appear to be otherwise connected. There is no clear description of what scenario each of the two similarities should be used in. 2. This paper presents the RLS and illustrates the poor stability of the recommender system, but it does not suggest a possible improvement plan for the essence of the problem reflected by the new metric. ----------------------- REVIEW 3 --------------------- SUBMISSION: 8774 TITLE: Rank List Sensitivity of Recommender Systems to Interaction Perturbations AUTHORS: Sejoon Oh, Berk Ustun, Julian McAuley and Srijan Kumar ----------- Overall evaluation ----------- SCORE: 1 (weak accept) ----- TEXT: This paper targets for an interesting problem in recommender system that the prediction results might be different when the training dataset has been perturbed even with minor changes. To deal with such problem, this paper proposes a metric for measuring the instability of recommender system and then proposes a method CASPER method for testing the instability. This paper also lists some existing weaknesses under the studied probelm, which is good for further researches. ----------- Strengths and reasons to accept ----------- S1. The motivation is very interesting for recommender system. Data perturbation have been studied for many years to show the stability of a deep learning model, with various areas such as image retrieval. For recommender systems, existing work studies the user-level\item-level perturbation, which will not perturb the top-k list of the same users. Thus, studying the interaction-level perturbation is interesting since it is very normal that the input training data are different in interaction level due to some data-processing issues in most real-world recommendation scenarios. S2. This paper studies the stability against random perturbations in terms of interaction-level. Besides, from the findings, this paper studies the instability of models under minimal random perturbations, and some intuitive findings are proposed, which is of worth for learning and understanding the deep learning models during training process. S3. To tackle the problem, the authors propose CASPER method to find perturbations that can lead to even higher instability, which helps understand the lowest stability exhibited by a model. Although the method has some weaknesses, the authors also list the issues behind the paper for further discussion and future work, which is good for a research problem. S4. The experiments settings are good and easy to follow. ----------- Weaknesses and limitations ----------- W1. It would be better to list more real-world applications that could benefit from such interaction-level perturbation. Through the current submission, it is unclear at such perspectives.