ICDM 2018 acceptance decision: regular paper (with reviews) Dear Wang-Cheng Kang and Julian McAuley, Congratulations! Your paper DM464: Self-Attentive Sequential Recommendation has been selected as a full paper for the 2018 IEEE International Conference on Data Mining (ICDM'18). The selection this year was highly competitive; with 948 submissions, the acceptance rate was 8.86% for full papers and 11.08% for short papers. The overall acceptance rate was under 20%. As a full paper your paper will be allocated 10 pages in the proceedings based on the IEEE double column format and 20 minutes for presentation at the conference. ** The reviews are attached at the end of this message.** This email contains a number of important instructions. Please read it carefully since it covers many details. - *Camera-ready papers are due by Sep. 13, 2018.* The PC spent considerable time and effort evaluating the papers, so please take their comments and suggestions into account to revise your paper before the final version is submitted. Remember that reproducibility is a central scientific principle. Therefore, please make sure that the code and the data are available (where needed, add them to a suitable repository, such as GitHub, Google Sites, or your homepage) and provide pointers to the appropriate URLs as a footnote in the final version of your paper. This will ensure that the community can benefit from your work, and may also attract additional citations to it. Information regarding formatting and submission of your final paper can be found at: https://ieeecps.org/#!/auth/login?ak=1&pid=5oyqGZmVwatArsjiAEZhpC - *The author registration deadline is Sep. 13, 2018.* ICDM is a forum for presenting and discussing current research in data mining. Therefore, at least one author for your paper must complete the conference registration (at the full registration rate) by Sep. 13, 2018, in order for the paper to be included in the conference proceedings and the program. Please make sure to fill in the "Paper ID" when completing the registration. The registration information can be found from the conference webpage (will be available at 21st Aug., 2018) http://icdm2018.org/ - *The policy for authors of multiple accepted papers. If you authored more than one accepted paper, you (or at least one of the authors) should register for each additional paper (both conference and workshop papers). In other words, each accepted paper requires a separate registration. If one author registers more than one paper, the additional papers can be registered using the "extra paper registration rate" (500 USD per paper). - *Student authors can apply for travel grants to offset the cost of attending the conference (see the conference web page http://icdm2018.org/ for details).* The conference dates are Nov. 17-20, 2018, and the venue is Singapore. Please follow the conference website at http://icdm2018.org/ for more information on the program, including workshops and participation (some information will be available shortly). Visa letter information will be announced on the conference website soon. Congratulations again and thanks for your contributions! We look forward to seeing you at the IEEE ICDM 2018 in Singapore. Dacheng Tao and Bhavani Thuraisingham ICDM'18 Program Co-Chairs ================================================================ --======== Review Reports ========-- The review report from reviewer #1: *1: Is the paper relevant to ICDM? [_] No [X] Yes *2: How innovative is the paper? [_] 6 (Very innovative) [_] 3 (Innovative) [X] -2 (Marginally) [_] -4 (Not very much) [_] -6 (Not at all) *3: How would you rate the technical quality of the paper? [_] 6 (Very high) [_] 3 (High) [X] -2 (Marginal) [_] -4 (Low) [_] -6 (Very low) *4: How is the presentation? [_] 6 (Excellent) [_] 3 (Good) [X] -2 (Marginal) [_] -4 (Below average) [_] -6 (Poor) *5: Is the paper of interest to ICDM users and practitioners? [X] 3 (Yes) [_] 2 (May be) [_] 1 (No) [_] 0 (Not applicable) *6: What is your confidence in your review of this paper? [_] 2 (High) [X] 1 (Medium) [_] 0 (Low) *7: Overall recommendation [_] 6: must accept (in top 25% of ICDM accepted papers) [X] 3: should accept (in top 80% of ICDM accepted papers) [_] -2: marginal (in bottom 20% of ICDM accepted papers) [_] -4: should reject (below acceptance bar) [_] -6: must reject (unacceptable: too weak, incomplete, or wrong) *8: Summary of the paper's main contribution and impact The authors propose a novel self-attention based sequential model for next item recommendation. The method models the entire user sequence (without any recurrent or convolutional operations), and adaptively considers consumed items for prediction. The authors conduct experiments on both sparse and dense datasets to demonstrate the effectiveness of the proposed method. *9: Justification of your recommendation The authors develop a method that has been shown to perform better than state-of-the-art baselines. *10: Three strong points of this paper (please number each point) S1. The authors apply self-attention mechanisms to sequential recommendation problem. S2. The authors conduct comprehensive experiments on several benchmark datasets and compares against the relevant baselines. *11: Three weak points of this paper (please number each point) W1. The technical solution is based on a recent work. Thus, the technical novelty in this work is limited. *12: Is this submission among the best 10% of submissions that you reviewed for ICDM'18? [X] No [_] Yes *13: Would you be able to replicate the results based on the information given in the paper? [X] No [_] Yes *14: Are the data and implementations publicly available for possible replication? [X] No [_] Yes *15: If the paper is accepted, which format would you suggest? [X] Regular Paper [_] Short Paper *16: Detailed comments for the authors The problem of "sequential recommendation" is different from predicting the next item a user is likely to consume. This needs to be addressed. Some of the recent work, e.g., [19] pursue the same definition as the authors in the paper ======================================================== The review report from reviewer #2: *1: Is the paper relevant to ICDM? [_] No [X] Yes *2: How innovative is the paper? [_] 6 (Very innovative) [_] 3 (Innovative) [X] -2 (Marginally) [_] -4 (Not very much) [_] -6 (Not at all) *3: How would you rate the technical quality of the paper? [_] 6 (Very high) [X] 3 (High) [_] -2 (Marginal) [_] -4 (Low) [_] -6 (Very low) *4: How is the presentation? [_] 6 (Excellent) [X] 3 (Good) [_] -2 (Marginal) [_] -4 (Below average) [_] -6 (Poor) *5: Is the paper of interest to ICDM users and practitioners? [X] 3 (Yes) [_] 2 (May be) [_] 1 (No) [_] 0 (Not applicable) *6: What is your confidence in your review of this paper? [X] 2 (High) [_] 1 (Medium) [_] 0 (Low) *7: Overall recommendation [_] 6: must accept (in top 25% of ICDM accepted papers) [_] 3: should accept (in top 80% of ICDM accepted papers) [X] -2: marginal (in bottom 20% of ICDM accepted papers) [_] -4: should reject (below acceptance bar) [_] -6: must reject (unacceptable: too weak, incomplete, or wrong) *8: Summary of the paper's main contribution and impact The paper introduces a model to predict the next item at each time step given the past history. Although the idea of self-attention has been around for some time now, it has never been applied to recommender systems before, making the model novel for the recommendation scenario. The idea is to use a weighted sum of the past hidden states as an extra input at each time step in the recurrent network. *9: Justification of your recommendation Recommender systems are well within the scope of the conference, which makes it quite relevant. The only problem from my side is lack of novelty, because the authors use the idea of self-attention which has been around and used in NLP tasks for quite some time now, but has never been applied for recommendation before. *10: Three strong points of this paper (please number each point) 1. The approach is simple yet promising. 2. The paper is well written. 3. The evaluation was done comprehensively. *11: Three weak points of this paper (please number each point) 1. I find the evaluation criterion to be rather shallow, since limiting the number of testing examples while evaluation is not a great idea. To get the scores for all items at the last time step in the item consumption sequence, a simple matrix multiplication (F_t * N^T ) is needed, which considering that you train your model on a GPU should not make much of a time difference. *12: Is this submission among the best 10% of submissions that you reviewed for ICDM'18? [_] No [X] Yes *13: Would you be able to replicate the results based on the information given in the paper? [_] No [X] Yes *14: Are the data and implementations publicly available for possible replication? [_] No [X] Yes *15: If the paper is accepted, which format would you suggest? [X] Regular Paper [_] Short Paper *16: Detailed comments for the authors Revisions: - "being on the one hand able to draw context" -> "being able to draw context" - "randomly sample 100 negative items, and rank these items" -> "randomly sample 100 negative items and rank these items" - "however the computation can be" -> "however, the computation can be" ======================================================== The review report from reviewer #3: *1: Is the paper relevant to ICDM? [_] No [X] Yes *2: How innovative is the paper? [_] 6 (Very innovative) [X] 3 (Innovative) [_] -2 (Marginally) [_] -4 (Not very much) [_] -6 (Not at all) *3: How would you rate the technical quality of the paper? [_] 6 (Very high) [X] 3 (High) [_] -2 (Marginal) [_] -4 (Low) [_] -6 (Very low) *4: How is the presentation? [_] 6 (Excellent) [_] 3 (Good) [X] -2 (Marginal) [_] -4 (Below average) [_] -6 (Poor) *5: Is the paper of interest to ICDM users and practitioners? [X] 3 (Yes) [_] 2 (May be) [_] 1 (No) [_] 0 (Not applicable) *6: What is your confidence in your review of this paper? [_] 2 (High) [X] 1 (Medium) [_] 0 (Low) *7: Overall recommendation [_] 6: must accept (in top 25% of ICDM accepted papers) [X] 3: should accept (in top 80% of ICDM accepted papers) [_] -2: marginal (in bottom 20% of ICDM accepted papers) [_] -4: should reject (below acceptance bar) [_] -6: must reject (unacceptable: too weak, incomplete, or wrong) *8: Summary of the paper's main contribution and impact In this manuscript, the authors propose a self-attention based sequential model for next item recommendation. The proposed model consists of the following layers, including embedding layer, self-attention layer, and prediction layer. In the first item embedding layer, a trainable position embedding matrix P is imposed to the model so that it can be aware of the positions of previous items. A self-attention layer is then connected to a feed-forward network. Several schemes for training neural network have also been adopted, such as residual connections and drop out layers, to propagate the last visited item’s embedding to the final layer and prevent over-fitting. Comprehensive experiments are conducted to evaluate the proposed model on three datasets. The performance outperforms the other baseline models. A comprehensive study has also been done to answer the proposed research questions, such as influences of various components, training efficiency, and scalability. *9: Justification of your recommendation The comprehensive experimental results are presented. Implementation details are also provided. Several research questions that the readers may raise before the experimental session are answered. The proposed model is compared with many other baseline methods. Overall, the quality of the work in the manuscript is good in spite of some missing details and rationality of why the authors adopt self-attention components. *10: Three strong points of this paper (please number each point) (1.)The proposed model outperforms the others in terms of hit rate and NDCG. (2.)The proposed model has higher training efficiency and greater scalability. (3.)A study on the different variation of the proposed model is presented to show the effective contribution of each component. *11: Three weak points of this paper (please number each point) (1.)It seems that some layers and components are included in the model without explicit reason even though the empirical results have demonstrated its effectiveness. It is suggested that the authors include more insights into why they include these layers in their model. (2.)In spite of examples for the attention matrix, it is also suggested to present some representative sequential actions to demonstrate that the model can actually learn the sequential patterns. (3.)The proposed model has been compared with RNN and CNN based baseline models. However, the results would be more convincing if the authors compare their proposed model with other attention based models. *12: Is this submission among the best 10% of submissions that you reviewed for ICDM'18? [X] No [_] Yes *13: Would you be able to replicate the results based on the information given in the paper? [_] No [X] Yes *14: Are the data and implementations publicly available for possible replication? [_] No [X] Yes *15: If the paper is accepted, which format would you suggest? [X] Regular Paper [_] Short Paper *16: Detailed comments for the authors (1.)There are notations that may not be defined before used in the manuscript. For example, in section III, does the superscript i for Ei and Msi represent row vector? (2.)In the equation 1, it is not clear that how the position embedding matrix works. Since both position embedding matrix P and item embedding matrix M are trained parameters, it seems that there is no difference between adding them up together or using only M. The authors may need to give more description on how they encode the position information by the position embedding matrix. (3.)It is also suggested that the authors can include the motivation behind self-attention and the possible explanation of why it can improve the performance. (4.)The experimental results is based on the testing set and the split of the dataset into training and testing set is performed only once. It is suggested that the authors should repeat the experiments for more times and report the average to make sure the that the results are not by chance. (5.)In the evaluation metrics, how do the author define the graded relevance (i.e., the gain)? (6.)A study on self-attention weighted matrix is presented in section IV, H. However, it is confusing about what the self-attention weighted matrix is here. It would be clearer for the authors to define it in section III. (7.)The problem is to predict the most recent action item based on the previous sequential action history. The order of actions matters in the experimental setting and in the model training. It seems not valid to split the dataset by choosing the most recent action as a testing set and the second most recent action as a validation set. The action in the testing set may not be the most recent one for the training set. It may be more plausible if the dataset is randomly split into training, validation and testing set and only the most recent action is used for either in validation or testing set. ======================================================== The review report from reviewer #4: *1: Is the paper relevant to ICDM? [_] No [X] Yes *2: How innovative is the paper? [_] 6 (Very innovative) [_] 3 (Innovative) [X] -2 (Marginally) [_] -4 (Not very much) [_] -6 (Not at all) *3: How would you rate the technical quality of the paper? [_] 6 (Very high) [X] 3 (High) [_] -2 (Marginal) [_] -4 (Low) [_] -6 (Very low) *4: How is the presentation? [_] 6 (Excellent) [_] 3 (Good) [X] -2 (Marginal) [_] -4 (Below average) [_] -6 (Poor) *5: Is the paper of interest to ICDM users and practitioners? [_] 3 (Yes) [X] 2 (May be) [_] 1 (No) [_] 0 (Not applicable) *6: What is your confidence in your review of this paper? [_] 2 (High) [X] 1 (Medium) [_] 0 (Low) *7: Overall recommendation [_] 6: must accept (in top 25% of ICDM accepted papers) [X] 3: should accept (in top 80% of ICDM accepted papers) [_] -2: marginal (in bottom 20% of ICDM accepted papers) [_] -4: should reject (below acceptance bar) [_] -6: must reject (unacceptable: too weak, incomplete, or wrong) *8: Summary of the paper's main contribution and impact This work studies a novel self-attention based sequential model SASRec for next item recommendation. The contribution and impact are good. *9: Justification of your recommendation The problem is interesting, and the motivation is strong. The proposed method can address the problem well. *10: Three strong points of this paper (please number each point) 1. The problem is interesting 2. The motivation is strong. 3. The technical part is good. *11: Three weak points of this paper (please number each point) 1. The writing should be further improved *12: Is this submission among the best 10% of submissions that you reviewed for ICDM'18? [X] No [_] Yes *13: Would you be able to replicate the results based on the information given in the paper? [X] No [_] Yes *14: Are the data and implementations publicly available for possible replication? [X] No [_] Yes *15: If the paper is accepted, which format would you suggest? [X] Regular Paper [_] Short Paper *16: Detailed comments for the authors This work studies a novel self-attention based sequential model SASRec for next item recommendation. I like this paper. ======================================================== The review report from reviewer #5: *1: Is the paper relevant to ICDM? [_] No [X] Yes *2: How innovative is the paper? [_] 6 (Very innovative) [X] 3 (Innovative) [_] -2 (Marginally) [_] -4 (Not very much) [_] -6 (Not at all) *3: How would you rate the technical quality of the paper? [_] 6 (Very high) [X] 3 (High) [_] -2 (Marginal) [_] -4 (Low) [_] -6 (Very low) *4: How is the presentation? [X] 6 (Excellent) [_] 3 (Good) [_] -2 (Marginal) [_] -4 (Below average) [_] -6 (Poor) *5: Is the paper of interest to ICDM users and practitioners? [X] 3 (Yes) [_] 2 (May be) [_] 1 (No) [_] 0 (Not applicable) *6: What is your confidence in your review of this paper? [X] 2 (High) [_] 1 (Medium) [_] 0 (Low) *7: Overall recommendation [_] 6: must accept (in top 25% of ICDM accepted papers) [X] 3: should accept (in top 80% of ICDM accepted papers) [_] -2: marginal (in bottom 20% of ICDM accepted papers) [_] -4: should reject (below acceptance bar) [_] -6: must reject (unacceptable: too weak, incomplete, or wrong) *8: Summary of the paper's main contribution and impact The paper is focused on the sequential recommendation, which is a very important problem in the literature of recommendation. This paper is the first work to apply self-attention mechanism to this recommendation problem. The proposed model significantly outperforms state-of-art deep learning based Methods for sequential recommendation. The experiments are comprehensive and the results are impressive. *9: Justification of your recommendation The problem is important, and the proposed idea is interesting. The technical part is solid, and the experiments are comprehensive. *10: Three strong points of this paper (please number each point) 1. The idea of applying self-attention mechanism to sequential recommendation is very interesting and straightforward. 2. The technical part is solid. 3. The experiments are very comprehensive, and the performance gains are very impressive as well. *11: Three weak points of this paper (please number each point) 1. The paper is applying self-attention mechanism to a recommendation problem, thus the novelty may be a bit limited in this sense. 2. One important relevant work is missing, which decreases the contribution of this work. The work is AAAI 18 paper “Attention-Based Transactional Context Embedding for Next-Item Recommendation”. 3. In the experiments, the authors did not compare their model with the AAAI 18 paper as well, which weakens the convincing of the results. *12: Is this submission among the best 10% of submissions that you reviewed for ICDM'18? [_] No [X] Yes *13: Would you be able to replicate the results based on the information given in the paper? [_] No [X] Yes *14: Are the data and implementations publicly available for possible replication? [_] No [X] Yes *15: If the paper is accepted, which format would you suggest? [X] Regular Paper [_] Short Paper *16: Detailed comments for the authors In this paper, the authors work on the sequential recommendation, which is an important problem. By applying the self-attention mechanism, the proposed method can address the problems facing existing deep learning based methods for sequential recommendation. Borrowing the idea of the state-of-art machine translation model, the authors designs embedding layers, self-attention blocks, and prediction layer to model the sequential recommendation, which can automatically assign weights to previous items when predicting the next item. The authors also conduct very comprehensive experiments on four real-world datasets, and the performance gains comparing to state-of-art baselines are very impressive, which demonstrates the effectiveness of the proposed model. Generally speaking, the paper is very good. The idea is straightforward. The technical part is solid, and the discussion of the proposed model with other deep learning based methods is insightful for us to understand the model better. Comprehensive experiments including performance comparisons, ablation study, efficiency and scalability, and case study, makes it very impressive and convincing. Besides, the presentation is good. Though the novelty of the idea may be a bit limited, considering the paper is just applying self-attention mechanism to sequential recommendation, the whole modeling process does make sense. However, there is a major concern for this problem, which is that a relevant paper published in AAAI 2018 is missing, which is titled “Attention-Based Transactional Context Embedding for Next-Item Recommendation”. In the AAAI paper, the authors also utilized the attention mechanism for next-item recommendation, which is the same to the task in this paper. Therefore, if the authors can discuss the difference from the AAAI work and report the performance comparisons, the paper will be much more convincing. In summary, despite the above concern, this paper is of high quality and should be accepted. ======================================================== : Meta Review: An adaptation of the self-attention model to the problem of sequential recommendation, consisting in predicting items in a sequence. On the positive side, the experimental evaluation is extensive and proves the effectiveness of the proposed architecture. On the negative side, novelty seems low, as the proposed architecture largely draws from building blocks from the current literature. Some choices are poorly justified.