Reviewer #1 Questions 1. Summary and contributions: Briefly summarize the paper and its contributions. The summary and the contributions of the paper are as follows: 1. Proposes a novel smooth sparsity measurement for attention matrices and demonstrates its effectiveness in capturing the local inductive bias of AR attention. 2. Analyzes the rank of attention matrices using SVD-based low-rank approximation and shows that AR attention stores richer data dynamics compared to AE attention. 3. Empirically compares vanilla and variant AE/AR attention models on five popular benchmarks, confirming the theoretical advantages of AR attention with better overall performance. 2. Strong points. List three to five strong points about the paper. Please be precise and explicit. Clearly explain the value and nature of the contribution. 1. Theoretical analysis of AE/AR are provided. 2. The empirical evaluation of the paper is good. 3. The clarify and presentation of the paper is good. 3. Weak points. List three to five weak points about the paper. Please clearly indicate whether the paper has any mistakes, missing related work, or results that cannot be considered as a contribution. Please be polite, specific, and constructive. 1. The scope of the analysis is limited. 2. the local induction bias should be described more clearly. 3. ablation studies are needed. 4. diverse datasets should be used. 5. other attention metrics would be useful as well. 6. analysis of the performance trends are needed. 4. Detailed Evaluation. Please provide detailed feedback about the strengths and the weaknesses of the paper and support your overall rating. You may talk about significance, technical depth, novelty, reproducibility, relevance to the community, and potential ethical considerations. Please be polite, specific, and constructive. This paper presents a valuable theoretical analysis and empirical evaluation of AE/AR attention in sequential recommendation. The findings contribute to a better understanding of attention behavior and provide practical insights for model design. However, the overall contribution is incremental, and the paper could benefit from exploring a wider range of analysis aspects and conducting more comprehensive ablation studies. The detailed feedback about the strengths are as follows: 1. Theoretical Rigor: The paper provides a comprehensive theoretical analysis of AE/AR attention matrices, exploring both sparse local inductive bias and low-rank approximation. The analysis is well-motivated and supported by mathematical properties and upper bounds. 2. Empirical Evaluation: The paper conducts extensive empirical experiments on five popular benchmarks, comparing both vanilla and variant AE/AR attention models. The evaluation is thorough and covers a wide range of design choices and model variants. 3. Clarity and Presentation: The paper is well-written and clearly presents the research questions, methodology, results, and conclusions. The figures and tables effectively illustrate the key findings. The detailed feedback about the weaknesses of the paper are as follows: 1. Limited Scope of Analysis: The paper primarily focuses on analyzing the attention matrix from the perspective of sparsity and rank. Other potential aspects of attention behavior, such as temporal dynamics or interaction patterns, could be explored further. 2. Local Inductive Bias: The paper uses the term "local inductive bias" to describe the tendency of AR attention to focus on nearby items in the sequence. It would be helpful to explicitly state that this bias is "sparse" and "local" to differentiate it from other forms of local attention mechanisms, such as Local-AE. 3. Ablation Studies: Conducting ablation studies to isolate the impact of individual components of the attention mechanism (e.g., causal masking, layer normalization) would help understand the specific contributions of each component to the overall performance. 4. Potential Bias in Dataset Selection: The paper uses a selection of datasets with varying sparsity levels, but the results might be influenced by the specific characteristics of these datasets. Additional experiments on diverse datasets would strengthen the generalizability of the findings. 5. Analysis of Other Attention Metrics: The paper primarily focuses on sparsity and rank as metrics for analyzing attention matrices. It would be interesting to explore other metrics, such as attention entropy or attention distribution, to gain further insights into the behavior of AE and AR attention. 6. Analysis of Performance Trends: The paper reports that AR attention consistently outperforms AE attention on various datasets and model variants. It would be beneficial to analyze the underlying trends and factors contributing to this performance advantage. For example, does the performance gap widen with increasing sparsity of the dataset? 5. Overall rating. Weak Accept (Probable accept, unless convinced otherwise) 8. Reviewer confidence. Knowledgeable: I have read papers on this topic. Reviewer #4 Questions 1. Summary and contributions: Briefly summarize the paper and its contributions. This paper addresses the debate between auto-encoding (AE) and auto-regressive (AR) attention mechanisms in sequential recommendation models. It provides a theoretical analysis of AE/AR attention matrices, focusing on sparsity and low-rank approximation, revealing that AR attention exhibits sparse local inductive bias, which is beneficial for sparse recommendation scenarios. Empirical experiments were conducted on five popular datasets, with AR models outperforming AE models in most cases. The findings suggest that AR models are superior for sequential recommendation tasks due to their enhanced performance and ability to model complex data dynamics. 2. Strong points. List three to five strong points about the paper. Please be precise and explicit. Clearly explain the value and nature of the contribution. 1) The problem studied in this paper, which compares AE and AR model types, is interesting and significant. With the application of large language models in the recommendation field, this work provides strong insights and reference value for future researchers. 2) This paper successfully combines theoretical analysis with empirical experimental results, providing mutual verification that makes the work very solid. 3) The smooth sparsity measurement of attention matrix designed in this paper is very interesting and has significant implications for measuring the sparsity of self-attention. 4) The experimental design of this paper is rigorous and comprehensive. 3. Weak points. List three to five weak points about the paper. Please clearly indicate whether the paper has any mistakes, missing related work, or results that cannot be considered as a contribution. Please be polite, specific, and constructive. 1) My main concern is that this paper should not only articulate three findings but also propose some solutions or more targeted recommendations for future work based on these findings. 2) How do the conclusions of this paper compare with existing analyses that contrast AE and AR? If there are differences, why are the conclusions in this paper more accurate? I recommend that the authors provide an analysis of this in the text. 3) There are several formatting issues in this paper that need attention, such as a formatting problem in the second paragraph of the Introduction, and the probability output Equation for the sequential recommendation model is missing an indexing number. 4) I am curious why the title of this paper emphasizes "lonely neighbourhood" when it seems to be just one conclusion among the three findings and does not fully represent the content of the paper. 4. Detailed Evaluation. Please provide detailed feedback about the strengths and the weaknesses of the paper and support your overall rating. You may talk about significance, technical depth, novelty, reproducibility, relevance to the community, and potential ethical considerations. Please be polite, specific, and constructive. Refer to weak points. 5. Overall rating. Weak Accept (Probable accept, unless convinced otherwise) 8. Reviewer confidence. Knowledgeable: I have read papers on this topic. Reviewer #5 Questions 1. Summary and contributions: Briefly summarize the paper and its contributions. The paper explores the comparative effectiveness of Auto-Encoding (AE) and Auto-Regressive (AR) self-attention mechanisms in sequential recommendation systems. It provides a theoretical framework based on sparsity and rank-k approximation, and supports the analysis with extensive empirical evaluations on five benchmark datasets. The study finds that AR models generally outperform AE models, particularly in sparse and short-sequence scenarios, and discusses the implications for future self-attentive recommender design. 2. Strong points. List three to five strong points about the paper. Please be precise and explicit. Clearly explain the value and nature of the contribution. 1. The paper presents a solid theoretical foundation for understanding the distinct characteristics of AE and AR self-attention mechanisms, addressing key aspects like sparsity and rank-k approximation. 2. It provides extensive empirical validation, demonstrating the superiority of AR models in 4 out of 5 datasets, which strengthens the theoretical claims. 3. By analyzing various design choices and model variants, the paper ensures that its findings are robust across different configurations. 3. Weak points. List three to five weak points about the paper. Please clearly indicate whether the paper has any mistakes, missing related work, or results that cannot be considered as a contribution. Please be polite, specific, and constructive. 1. The paper identifies long-term visit effects in the Yelp dataset, but lacks a detailed explanation and potential solutions. 2. The paper uses an older implementation of BERT4Rec, potentially impacting the fairness of its comparison with AR models. According to [1], BERT4Rec implemented by huggingface achieved can better results. [1] A Systematic Review and Replicability Study of BERT4Rec for Sequential Recommendation 4. Detailed Evaluation. Please provide detailed feedback about the strengths and the weaknesses of the paper and support your overall rating. You may talk about significance, technical depth, novelty, reproducibility, relevance to the community, and potential ethical considerations. Please be polite, specific, and constructive. The strong and weak points can be seen above. 5. Overall rating. Weak Accept (Probable accept, unless convinced otherwise) 8. Reviewer confidence. Knowledgeable: I have read papers on this topic. Reviewer #6 Questions 1. Summary and contributions: Briefly summarize the paper and its contributions. The paper offers a comprehensive theoretical analysis of the AE/AR attention matrix, focusing on two key aspects: (1) sparse local induction bias, or neighbourhood effects, and (2) low-rank approximation. The authors' carefully chosen analytical metrics demonstrate that AR attention exhibits sparse neighbourhood effects, making it well-suited for general sparse recommendation scenarios. Furthermore, extensive empirical experiments are conducted, comparing various fictional and variant AE/AR attention models across five popular benchmarks, with results showing that AR attention models outperform overall. 2. Strong points. List three to five strong points about the paper. Please be precise and explicit. Clearly explain the value and nature of the contribution. 1. The authors present a comprehensive theoretical analysis of the AE/AR attention matrix, focusing on sparse local induction bias and low-rank approximation. 2. The authors conduct extensive empirical experiments, comparing vanilla and variant AE/AR attention models across five popular benchmark datasets. 3. Weak points. List three to five weak points about the paper. Please clearly indicate whether the paper has any mistakes, missing related work, or results that cannot be considered as a contribution. Please be polite, specific, and constructive. 1. Why does long-term visiting cause AR to behave this way on Yelp in Section 5.2? The author should provide a more detailed explanation. 2. In Chapter 3, should ‘we first drive an optimal sparsity measurement and its properties’ be ‘derive’? If not, what is the author trying to convey? 3. The explanation corresponding to the substitution of p set to 2 in Equation 3 should be placed together for easier understanding. 4. What does L1/L2 refer to in Chapter 3.2? The author should have explicitly explained this. 4. Detailed Evaluation. Please provide detailed feedback about the strengths and the weaknesses of the paper and support your overall rating. You may talk about significance, technical depth, novelty, reproducibility, relevance to the community, and potential ethical considerations. Please be polite, specific, and constructive. The authors present a comprehensive theoretical analysis of the AE/AR attention matrix, addressing both sparse local induction bias and low-rank approximation, supported by sufficient experimental validation. 5. Overall rating. Weak Accept (Probable accept, unless convinced otherwise) 8. Reviewer confidence. Knowledgeable: I have read papers on this topic. Go Back