Paper Decision DecisionProgram Chairs21 Jan 2024, 23:40 (modified: 22 Jan 2024, 22:39)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers, AuthorsRevisions Decision: Accept Comment: This paper focuses on learning-to-rank debiasing by decomposing the relevance and observation factors. The topic is highly relevant to the search track. The key contribution include an attention mechanism to capture hidden correlations among user-item features, and the regularization term to guide the disentanglement. These two factors are mainly overlooked by the existing literature. The raised concerns include the unclear motivation, lack of more baselines and limited analysis. Most of these concerns are well addressed in the author rebuttal. The rest would also be well fixed after simple revise. Overall, the pros seem to outweigh the cons, and an acceptance would be delivered given there are sufficient slots. Response to all reviewers Official CommentAuthors (Jiarui Jin, Weinan Zhang, Jun Wang, Julian McAuley, +3 more)15 Dec 2023, 03:41 (modified: 15 Dec 2023, 03:46)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers, Reviewers Submitted, AuthorsRevisions Comment: We summarize our responses and the results of the suggested experiments here. We also respond to every specific concern of each reviewer as individual comments below. As one of the main concerns lies in the novelty of our paper, we summarize our novelties and contributions as follows. (i) Summarizing the position bias and the popularity bias into a single observation factor and proposing a regularization term for popular two-tower architecture. We want to highlight that this solution for jointly addressing the position bias and the popularity bias is easy-to-implement and super-friendly to popular two-tower architectures widely adopted in real-world platforms such as [1][2], as summarized in [3]. This characteristic renders our model easily deployable in real-world use cases, alleviating concerns about heavy computational loads and facilitating practical implementation (as discussed in Appendix D). (ii) Using attention models to decouple the features for observation estimation and relevance estimation. We leverage an attention-based two-tower architecture to autonomously disentangle user-item features pertaining to relevance and observation factors. Our model supervises this disentanglement process by introducing a regularization term that emphasizes the conditional independence between observation and relevance factors. This approach represents a departure from prior methodologies and stands as a distinctive contribution in our study. [1] DeepFM: A Factorization-Machine based Neural Network for CTR Prediction. [2] Deep Interest Network for Click-Through Rate Prediction. [3] IntTower: the Next Generation of Two-Tower Model for Pre-Ranking System. Another suggestion is applying more recent methods as baselines. We have included three recent papers [4][5][6] as baselines into our paper. Due to the time limitation, we report the results with the setting of Yahoo (UBM), LETOR (UBM), and Adressa (UBM), as introduced in Table 2, as follows. The following table summarizes the results on Yahoo (UBM). Ranker Debiasing Method MAP N@3 N@5 N@10 InfoRanker InfoRank 0.845 0.736 0.739 0.779 InfoRanker UPE 0.844 0.721 0.710 0.750 InfoRanker Vectorization 0.841 0.698 0.701 0.744 LambdaMART PRS 0.838 0.717 0.727 0.760 DNN InfoRank 0.828 0.683 0.696 0.734 DNN UPE 0.823 0.680 0.697 0.732 DNN Vectorization 0.824 0.682 0.694 0.729 The results of LETOR (UBM) are reported as follows. Ranker Debiasing Method MAP N@3 N@5 N@10 InfoRanker InfoRank 0.650 0.380 0.460 0.541 InfoRanker UPE 0.642 0.378 0.449 0.536 InfoRanker Vectorization 0.640 0.377 0.444 0.534 LambdaMART PRS 0.633 0.367 0.422 0.509 DNN InfoRank 0.637 0.360 0.416 0.499 DNN UPE 0.629 0.356 0.416 0.495 DNN Vectorization 0.628 0.357 0.412 0.492 We also report the results of Adress (UBM) as follows. . Ranker Debiasing Method MAP N@3 N@5 N@10 InfoRanker InfoRank 0.801 0.691 0.715 0.739 InfoRanker UPE 0.794 0.677 0.705 0.728 InfoRanker Vectorization 0.795 0.678 0.703 0.729 LambdaMART PRS 0.796 0.671 0.714 0.734 DNN InfoRank 0.786 0.667 0.692 0.725 DNN UPE 0.782 0.660 0.681 0.719 DNN Vectorization 0.786 0.665 0.687 0.721 Here, Vectorization is proposed in [5] and UPE is proposed in [6], where we combine [5] and [6] with both DNN ranker and InfoRank ranker. PRS is proposed in [4], where we only combine it with LambdaMART, as it is originially designed for pair-wise ranker. For ease of comparisons, we also include the results of our InfoRank into the tables. These results further verfy the superiority of InfoRank. We will complete Tables 2 and 3 with the above new baselines in our revision. [4] Unbiased Learning to Rank via Propensity Ratio Scoring. 2020. [5] Scalar is Not Enough: Vectorization-based Unbiased Learning to Rank. 2022. [6] Unconfounded Propensity Estimation for Unbiased Ranking. 2023. Official Review of Submission262 by Reviewer uAZq Official ReviewReviewer uAZq01 Dec 2023, 00:26 (modified: 01 Dec 2023, 22:25)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Reviewer uAZq, AuthorsRevisions Review: This paper proposes a new unbiased learning-to-rank model, named InforRank, via conditional mutual information minimization. The model is interesting and clear introduced. Various experiments are conducted to show its effectiveness. Some Strengths: This paper focuses on unbiased learning to rank, a classic and important research problem in information retrieval scenarios. The method section of the paper is detailed, and the appendix also includes a complete proof process. The author conducted various experiments to verify the effectiveness of the proposed model across multiple datasets with different settings. However, there are also some weaknesses: In the experimental section, the paper does not compare the proposed method with state-of-the-art related work, even though the author cited some recent related works [30]. The introduction of recent works in the related work section is insufficient, citing only one paper from 2022 and one from 2023. Questions: Concerns about the choice of baselines. In Tables 2 and 3, did the author compare whether the debiasing strategy of InfoRank achieves the best performance under different ranker settings? Ethics Review Flag: No Ethics Review Description: No ethical issues Scope: 4: The work is relevant to the Web and to the track, and is of broad interest to the community Novelty: 5 Technical Quality: 4 Reviewer Confidence: 2: The reviewer is willing to defend the evaluation, but it is likely that the reviewer did not understand parts of the paper Response to reviewer uAZq Official CommentAuthors (Jiarui Jin, Weinan Zhang, Jun Wang, Julian McAuley, +3 more)15 Dec 2023, 03:36 (modified: 15 Dec 2023, 03:47)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, AuthorsRevisions Comment: In the experimental section, the paper does not compare the proposed method with state-of-the-art related work, even though the author cited some recent related works [1]. [1] Unconfounded Propensity Estimation for Unbiased Ranking. 2023. Thanks for your suggestion. We agree that incorporating recent papers as additional baseline methods would heavily make our paper solid. Therefore, we have included three recent papers [1][2][3] as baselines into our paper. Due to the time limitation, we report the results with the setting of Yahoo (UBM), LETOR (UBM), and Adressa (UBM), as introduced in Table 2, as follows. The following table summarizes the results on Yahoo (UBM). Ranker Debiasing Method MAP N@3 N@5 N@10 InfoRanker InfoRank 0.845 0.736 0.739 0.779 InfoRanker UPE 0.844 0.721 0.710 0.750 InfoRanker Vectorization 0.841 0.698 0.701 0.744 LambdaMART PRS 0.838 0.717 0.727 0.760 DNN InfoRank 0.828 0.683 0.696 0.734 DNN UPE 0.823 0.680 0.697 0.732 DNN Vectorization 0.824 0.682 0.694 0.729 The results of LETOR (UBM) are reported as follows. Ranker Debiasing Method MAP N@3 N@5 N@10 InfoRanker InfoRank 0.650 0.380 0.460 0.541 InfoRanker UPE 0.642 0.378 0.449 0.536 InfoRanker Vectorization 0.640 0.377 0.444 0.534 LambdaMART PRS 0.633 0.367 0.422 0.509 DNN InfoRank 0.637 0.360 0.416 0.499 DNN UPE 0.629 0.356 0.416 0.495 DNN Vectorization 0.628 0.357 0.412 0.492 We also report the results of Adress (UBM) as follows. . Ranker Debiasing Method MAP N@3 N@5 N@10 InfoRanker InfoRank 0.801 0.691 0.715 0.739 InfoRanker UPE 0.794 0.677 0.705 0.728 InfoRanker Vectorization 0.795 0.678 0.703 0.729 LambdaMART PRS 0.796 0.671 0.714 0.734 DNN InfoRank 0.786 0.667 0.692 0.725 DNN UPE 0.782 0.660 0.681 0.719 DNN Vectorization 0.786 0.665 0.687 0.721 Here, Vectorization is proposed in [2] and UPE is proposed in [1], where we combine [1] and [2] with both DNN ranker and InfoRank ranker. PRS is proposed in [3], where we only combine it with LambdaMART, as it is originially designed for pair-wise ranker. For ease of comparisons, we also include the results of our InfoRank into the tables. These results further verfy the superiority of InfoRank. We will complete Tables 2 and 3 with the above new baselines in our revision. [2] Scalar is Not Enough: Vectorization-based Unbiased Learning to Rank. 2022. [3] Unbiased Learning to Rank via Propensity Ratio Scoring. 2020. The introduction of recent works in the related work section is insufficient, citing only one paper from 2022 and one from 2023. Thanks for your suggestion. We will include more recent papers into our related work section. In Tables 2 and 3, did the author compare whether the debiasing strategy of InfoRank achieves the best performance under different ranker settings? In Tables 2 and 3, we compare InfoRank (Debiasing) with two different rankers, i.e., InfoRank (Ranking) and DNN. We do not combine InfoRank (Debiasing) with LambdaMART, as it is not straightforward and it would be out of scope of this paper. Results shown in Tables 2 and 3 show that our method can consistently outperforms the exsiting debiasing methods with different rankers. Replying to Response to reviewer uAZq Official Comment by Reviewer gf6H Official CommentReviewer gf6H18 Dec 2023, 23:07Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors Comment: Can you provide the p-value of the statistical test between the info ranker and the best baseline? Official Review of Submission262 by Reviewer dFXv Official ReviewReviewer dFXv29 Nov 2023, 08:17 (modified: 01 Dec 2023, 22:25)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Reviewer dFXv, AuthorsRevisions Review: Summary: The paper titled "InfoRank: Unbiased Learning-to-Rank via Conditional Mutual Information Minimization" addresses the challenge of bias in learning-to-rank systems, specifically focusing on position and popularity biases. Learning-to-rank is crucial in applications like recommender systems, where user feedback (such as clicks) is used to rank items. However, this feedback is often biased towards items that are already ranked highly, creating a "rich-get-richer" effect. The paper proposes InfoRank, a novel paradigm that aims to simultaneously address both position and popularity biases by consolidating these biases into a single 'observation' factor. The approach involves minimizing mutual information between observation estimation and relevance estimation, conditioned on input features, to ensure unbiased relevance estimation. InfoRank uses an attention mechanism to capture latent correlations in user-item features and introduces a regularization term based on conditional mutual information. This framework was tested across three diverse datasets, demonstrating its effectiveness compared to state-of-the-art baselines. Strengths: InfoRank introduces a unique method for addressing biases in learning-to-rank systems. The consolidation of biases into a single observation factor and the use of conditional mutual information for debiasing are novel and potentially impactful contributions. The framework was evaluated across three diverse datasets, ensuring that the findings are robust and applicable across different scenarios. The paper suggests that the conditional mutual information minimization approach can potentially enhance other ranking models, indicating a broader impact beyond the specific framework of InfoRank. Weaknesses: The method, while effective, appears complex in terms of implementation. This complexity might limit its adoption in practical settings where simpler solutions are preferred. Although the paper shows effectiveness on diverse datasets, real-world applicability and performance in live environments have not been demonstrated. The reliance on an attention mechanism and multiple layers of modeling could lead to overfitting, especially in scenarios with limited or noisy data. Questions: How does InfoRank perform in real-world, live environments, especially where user behavior and item popularity can be highly dynamic? Is there a risk of overfitting due to the complexity of the model, and how does InfoRank address this? Can the techniques used in InfoRank be simplified for easier implementation without significantly compromising on performance? Ethics Review Flag: No Ethics Review Description: no Scope: 4: The work is relevant to the Web and to the track, and is of broad interest to the community Novelty: 5 Technical Quality: 5 Reviewer Confidence: 3: The reviewer is confident but not certain that the evaluation is correct Official Comment by Reviewer dFXv Official CommentReviewer dFXv14 Dec 2023, 01:00Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Reviewer dFXv, Authors Comment: Thanks for the author's reply, I don't have any more questions, I choose to keep my score Response to reviewer dFXv Official CommentAuthors (Jiarui Jin, Weinan Zhang, Jun Wang, Julian McAuley, +3 more)15 Dec 2023, 03:37 (modified: 15 Dec 2023, 03:48)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, AuthorsRevisions Comment: Thanks for your questions. Please also see the main response above. The method, while effective, appears complex in terms of implementation. This complexity might limit its adoption in practical settings where simpler solutions are preferred. (i) Theoretica and Empricial Evaluation of Time Complexity: In Section 3.5, we conducted a theoretical assessment of the time complexity associated with our framework. Additionally, for our revision, we plan to empirically test the time complexity to complement and validate our theoretical findings. (ii) Framework Implementation Description: Appendix D provides comprehensive guidance on how to implement our framework. This detailed section serves as a practical guide, offering clear instructions and insights into the implementation process. As introduced in the appendix, our framework adopts a two-tower architecture, a widely utilized structure in real-world platforms as highlighted in [1][2] and summarized in [3]. This architectural choice ensures our model's ease of deployment in real-world scenarios, mitigating concerns related to computational intensity and facilitating practical implementation in diverse use cases. [1] DeepFM: A Factorization-Machine based Neural Network for CTR Prediction. [2] Deep Interest Network for Click-Through Rate Prediction. [3] IntTower: the Next Generation of Two-Tower Model for Pre-Ranking System. Although the paper shows effectiveness on diverse datasets, real-world applicability and performance in live environments have not been demonstrated. Thanks for your suggestion. We are seeking the opportunity to collaborate with industry to evaluate our framework in livestream platforms. The reliance on an attention mechanism and multiple layers of modeling could lead to overfitting, especially in scenarios with limited or noisy data. Is there a risk of overfitting due to the complexity of the model, and how does InfoRank address this? We have already conducted evaluations of InfoRank with various amounts of training data and observed relatively robust performance across different data volumes. To further enhance the robustness in practice, the addition of an L2 regularization term is a standard solution that can effectively address overfitting issues. This regularization technique, commonly used in machine learning models, helps prevent excessive reliance on specific features during training, thus reducing overfitting by penalizing large parameter values. Moreover, the topic of overfitting in recommendation systems has been extensively investigated in literature, such as [4], which is out of the scope of this paper. [4] Towards Understanding the Overfitting Phenomenon of Deep Click-Through Rate Prediction Models. How does InfoRank perform in real-world, live environments, especially where user behavior and item popularity can be highly dynamic? Initially, our framework concentrates on harnessing the inherently biased historical feedback derived from user interactions. Within the practical operational sphere, our system actively acquires data from the platform in real-time. Subsequently, it utilizes InfoRank to iteratively update the user modeling procedures or recommendation strategies. This approach allows us to model the evolution of "dynamic" or "drifting" user interests by analyzing and incorporating insights gleaned from users' historical interaction logs. Additionally, the item popularity can change with high frequency, and one direct solution is to update the system with higher frequency. We contend that effectively managing these fluctuations relies heavily upon the holistic design and efficiency of the entire update pipeline. Can the techniques used in InfoRank be simplified for easier implementation without significantly compromising on performance? (i) Our primary proposition revolves around introducing a regularization term aimed at promoting the conditional independence between observation and relevance factors. This regularization technique seamlessly integrates into a "two-tower" architecture, such as the DNN, which inherently incorporates estimations of observation and relevance. We conducted evaluations of our debiasing component using the DNN framework and have detailed these findings in the provided table. (ii) Furthermore, we delve into a comprehensive discussion regarding the practical implementation feasibility of our framework in Appendix D. This detailed discussion encompasses various aspects relevant to the implementation process, offering insights into the practicality and execution of our proposed approach Official Review of Submission262 by Reviewer 9oe5 Official ReviewReviewer 9oe528 Nov 2023, 03:23 (modified: 01 Dec 2023, 22:25)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Reviewer 9oe5, AuthorsRevisions Review: The paper discusses the challenges associated with ranking items based on user interests, particularly the biases introduced by past click-through behaviors. To address these biases, the paper proposes a new learning-to-rank paradigm called InfoRank. InfoRank aims to simultaneously handle position and popularity biases by consolidating them into a unified observation factor. The approach involves minimizing the mutual information between observation and relevance estimations conditioned on input features, ensuring bias-free relevance estimation. Implementation includes an attention mechanism to capture latent correlations within user-item features and a regularization term based on conditional mutual information to promote conditional independence. Experimental evaluations on three extensive recommendation and search datasets demonstrate that InfoRank produces more precise and unbiased ranking strategies. The work is timely and relevant to the WebConf Questions: Baselines used are strong. However, i wonder why the baselines in references 24, 47 and 28 are not used especially your approach is unifying these biases. Ethics Review Flag: No Ethics Review Description: dataset based work and hence no obvious issues noticed Scope: 4: The work is relevant to the Web and to the track, and is of broad interest to the community Novelty: 5 Technical Quality: 5 Reviewer Confidence: 2: The reviewer is willing to defend the evaluation, but it is likely that the reviewer did not understand parts of the paper Response to reviewer 9oe5 Official CommentAuthors (Jiarui Jin, Weinan Zhang, Jun Wang, Julian McAuley, +3 more)15 Dec 2023, 03:38 (modified: 15 Dec 2023, 03:50)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, AuthorsRevisions Comment: Thanks for your suggestion. Please also see the main response above. Baselines used are strong. However, i wonder why the baselines in references [1], [2] and [3] are not used especially your approach is unifying these biases. [1] A Deep Recurrent Survival Model for Unbiased Ranking. 2020. [2] Correcting Popularity Bias by Enhancing Recommendation Neutrality. 2014. [3] Unbiased Learning to Rank via Propensity Ratio Scoring. 2020. Thanks for your suggestion. We agree that incorporating recent papers as additional baseline methods would heavily make our paper solid. Therefore, we have included three recent papers [3][4][5] as baselines into our paper. Due to the time limitation, we report the results with the setting of Yahoo (UBM), LETOR (UBM), and Adressa (UBM), as introduced in Table 2, as follows. The following table summarizes the results on Yahoo (UBM). Ranker Debiasing Method MAP N@3 N@5 N@10 InfoRanker InfoRank 0.845 0.736 0.739 0.779 InfoRanker UPE 0.844 0.721 0.710 0.750 InfoRanker Vectorization 0.841 0.698 0.701 0.744 LambdaMART PRS 0.838 0.717 0.727 0.760 DNN InfoRank 0.828 0.683 0.696 0.734 DNN UPE 0.823 0.680 0.697 0.732 DNN Vectorization 0.824 0.682 0.694 0.729 The results of LETOR (UBM) are reported as follows. Ranker Debiasing Method MAP N@3 N@5 N@10 InfoRanker InfoRank 0.650 0.380 0.460 0.541 InfoRanker UPE 0.642 0.378 0.449 0.536 InfoRanker Vectorization 0.640 0.377 0.444 0.534 LambdaMART PRS 0.633 0.367 0.422 0.509 DNN InfoRank 0.637 0.360 0.416 0.499 DNN UPE 0.629 0.356 0.416 0.495 DNN Vectorization 0.628 0.357 0.412 0.492 We also report the results of Adress (UBM) as follows. . Ranker Debiasing Method MAP N@3 N@5 N@10 InfoRanker InfoRank 0.801 0.691 0.715 0.739 InfoRanker UPE 0.794 0.677 0.705 0.728 InfoRanker Vectorization 0.795 0.678 0.703 0.729 LambdaMART PRS 0.796 0.671 0.714 0.734 DNN InfoRank 0.786 0.667 0.692 0.725 DNN UPE 0.782 0.660 0.681 0.719 DNN Vectorization 0.786 0.665 0.687 0.721 Here, Vectorization is proposed in [4] and UPE is proposed in [5], where we combine [4] and [5] with both DNN ranker and InfoRank ranker. PRS is proposed in [3], where we only combine it with LambdaMART, as it is originially designed for pair-wise ranker. For ease of comparisons, we also include the results of our InfoRank into the tables. These results further verfy the superiority of InfoRank. We will complete Tables 2 and 3 with the above new baselines in our revision. [4] Scalar is Not Enough: Vectorization-based Unbiased Learning to Rank. 2022. [5] Unconfounded Propensity Estimation for Unbiased Ranking. 2023. We do not compare InfoRank against [1], because it is not straightforward to adopt InfoRank into the RNN ranker used in [1], and we do not compare InfoRank against [2], since [2] is typically designed to address the popularity bias and it can not well align with the evaluation setting in Tables 2 and 3. Official Review of Submission262 by Reviewer yuH2 Official ReviewReviewer yuH222 Nov 2023, 20:14 (modified: 01 Dec 2023, 22:25)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Reviewer yuH2, AuthorsRevisions Review: [Summary] This paper targets the unbiased learning-to-rank problem, with a particular focus on position and popularity bias. This paper proposes a new framework called InfoRank. InfoRank employs two separate MLP modules: one to estimate relevance given the observation and user-item features, and the other to estimate the observation given the user-item features. The click is then estimated by multiplying the outputs from these MLP modules. The InfoRank framework learns using binary cross-entropy loss with ground-truth click data, along with a regularization term based on conditional mutual information. The experiments are conducted on three ranking models (i.e., DNN, LambdaMART, and InfoRANK) and on three datasets (i.e., Yahoo, LETOR, and Adressa). [Strengths] This paper targets an important research problem for the ranking model. The proposed method is explained in detail and is easy to understand. This paper provides experimental results with statistical significance tests. [Weaknesses] My major concern pertains to the novelty of this paper. The major distinction of InfoRank is to promote conditional independence of relevance and observation with two separate MLP modules. The idea of decomposing implicit user feedback into observation and relevance has been extensively studied in the literature, often estimated by independent models [1]. Furthermore, the conditional independence of relevance and observation, also referred to as the unconfoundedness assumption, has recently been questioned by researchers [2]. In this context, I believe that the novelty and impact of the proposed framework are limited. [1] Unbiased Recommender Learning from Missing-Not-At-Random Implicit Feedback, WSDM'20 [2] Reconsidering Learning Objectives in Unbiased Recommendation with Unobserved Confounders, KDD'23 I believe this paper overlooks a very closely related work [3] that also decouples the effects of relevance and observation for unbiased learning to rank problem. I recommend that the authors discuss and compare this work in the paper. [3] LBD: Decouple Relevance and Observation for Individual-Level Unbiased Learning to Rank, NeurIPS'22 In line 446, isn't it "user-item features", instead of "user-features"? Questions: Please refer to the weaknesses described in my review. Thank you. Ethics Review Flag: No Ethics Review Description: I don't have ethical concerns Scope: 4: The work is relevant to the Web and to the track, and is of broad interest to the community Novelty: 2 Technical Quality: 3 Reviewer Confidence: 3: The reviewer is confident but not certain that the evaluation is correct Response to reviewer yuH2 Official CommentAuthors (Jiarui Jin, Weinan Zhang, Jun Wang, Julian McAuley, +3 more)15 Dec 2023, 03:38 (modified: 15 Dec 2023, 03:51)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, AuthorsRevisions Comment: Thanks for your questions. Please also see the main response above. My major concern pertains to the novelty of this paper. The major distinction of InfoRank is to promote conditional independence of relevance and observation with two separate MLP modules. The idea of decomposing implicit user feedback into observation and relevance has been extensively studied in the literature, often estimated by independent models [1]. Furthermore, the conditional independence of relevance and observation, also referred to as the unconfoundedness assumption, has recently been questioned by researchers [2]. In this context, I believe that the novelty and impact of the proposed framework are limited. [1] Unbiased Recommender Learning from Missing-Not-At-Random Implicit Feedback, WSDM'20 [2] Reconsidering Learning Objectives in Unbiased Recommendation with Unobserved Confounders, KDD'23 Thanks for your question. Indeed, we summarize multiple ranking biases into a single observation factor, and propose a regularization term based on conditional independence of observation and relevance factors. Here is our response to the questions posed in [2] regarding the unconfoundedness assumption: (i) [2] reveals that the unconfoundedness assumption might not hold true unless every conceivable factor influencing users' decision-making processes is incorporated as a feature. Our proposed attention mechanism aims to address this concern by enabling our model to capture intricate correlations, thereby enriching the existing feature set. While the technique outlined in [2] focuses on uncovering unobserved confounders, our approach aims to discover correlations that were previously challenging to model using traditional methods. Our model operates by identifying and leveraging these correlations, thereby enhancing the capacity to discern nuanced patterns beyond what prior approaches could accommodate. (ii) [2] discusses that the theoretical analysis of the re-weighting objective might not offer comprehensive generalizability to various unbiased algorithms that rely on unbiased uniform data. There are essentially two categories of unbiased ranking models: one that operates using unbiased uniform data and another that leverages data collected from real-world platforms. Our paper primarily focuses on the latter category, representing a distinct direction within the realm of unbiased ranking methodologies that aligns with real-world data. (iii) [2] reveals that the re-weighting objective requires accurate estimation of exposure probability, a task often fraught with challenges and subject to significant variance. In our approach, we diverge from traditional methods that rely on single position-aware user click models, as outlined in prior works [4][5]. Instead, we adopt a novel strategy by mining and deriving the estimation of the observation factor through correlations among user-item features. This innovative approach allows us to effectively estimate the observation factor without relying on conventional click models, thereby addressing challenges related to exposure probability estimation. [4] Position bias estimation for unbiased learning to rank in personal search. [5] An Unbiased Pairwise Learning-to-Rank Algorithm. I believe this paper overlooks a very closely related work [3] that also decouples the effects of relevance and observation for unbiased learning to rank problem. I recommend that the authors discuss and compare this work in the paper. [3] LBD: Decouple Relevance and Observation for Individual-Level Unbiased Learning to Rank Thanks for your suggestion. We agree that our paper is related to [3], as our method is also addressing the coupling effect as defined in Section 4.1. Here is a summary of the distinctions between [3] and our paper: (i) Unlike the soft decoupling method introduced in [3], which includes associated concepts such as Lipschitz decoupling and Bernoulli decoupling, our approach employs an attention model. This attention mechanism operates automatically, enabling the model to discern intricate correlations without relying on explicitly defined decoupling strategies. (ii) While [3] primarily concentrates on updating observation and relevance estimations through an analysis of observation and relevance changes, our methodology involves analyzing the conditional mutual information between observation and relevance estimations. This analysis obviates the need for observation Lipschitz assumptions and additional sampling. By exploring conditional independence, our approach encourages the attention model to allocate distinct weights to features concerning observation and click estimations, facilitating a more nuanced and adaptive weighting scheme without the reliance on specific assumptions or sampling techniques. We will add the discussion between our paper in our revision, to enhance the context and understanding for readers. Official Review of Submission262 by Reviewer gf6H Official ReviewReviewer gf6H22 Nov 2023, 06:25 (modified: 01 Dec 2023, 22:25)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Reviewer gf6H, AuthorsRevisions Review: The paper tackles the important problem of position and popularity biases in recommendation systems. The paper unifies those two biases into a single observation factor, and debiases it. In particular, the paper learns an end-to-end click model that can be separated as a product of two attention-based binary classifiers: observation model and relevance model. Only if the classifiers predict both observation and relevance, then a click is predicted. Finally, the paper suggests the resulting relevance model as the unbiased ranker. The paper tests the solution on synthetic data, outperforms several baselines. Presentation: I found the paper well written and well motivated. Novelty: The paper's main claim-to-fame is providing a unified approach to deal with both position and popularity biases. However, I do not believe this unified approach is new. Several papers designed a unified approach to deal with several biases, including popularity and selection biases (For example, AutoDebias by Chen et al 2021, debias popularity bias as part of the exposure bias). In addition, the idea of learning through parallel observation and relevance models such that click is the "and" operation of them is well known in debiasing. However, I don't remember seeing attention-based models with the same input used for this task - if this is the main novelty of the paper I suggest emphasizing this point. Experiment: The paper demonstrates the power of its method on syntactic data using several observation models, and shows meaningful results. I find the use of several observation models important to convince that in a real-life scenario the method will also work well. However, and I know it is challenging to perform, there is no replacement for an online experiment in debiasing. Finally, I find the baselines relatively weak, as no baseline method from the last 4 years was used. Since I view the main novelty as attention-based architecture, I believe it should be compared to recent works. In conclusion, the paper has its merits, but I believe the main contribution should be restated, and additional baselines should be added to the experiments. Questions: What is the novelty of the paper? Is it indeed the use of attention-based architecture, or is it the unified approach for debiasing several types of biases? If it is the latter, please explain the difference from previous works. Ethics Review Flag: No Ethics Review Description: - Scope: 4: The work is relevant to the Web and to the track, and is of broad interest to the community Novelty: 3 Technical Quality: 4 Reviewer Confidence: 3: The reviewer is confident but not certain that the evaluation is correct Response to reviewer gf6H Official CommentAuthors (Jiarui Jin, Weinan Zhang, Jun Wang, Julian McAuley, +3 more)15 Dec 2023, 03:40 (modified: 15 Dec 2023, 03:52)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, AuthorsRevisions Comment: Thanks for your suggestions. Please also see the main response above. The paper's main claim-to-fame is providing a unified approach to deal with both position and popularity biases. However, I do not believe this unified approach is new. Several papers designed a unified approach to deal with several biases, including popularity and selection biases (For example, AutoDebias by Chen et al 2021, debias popularity bias as part of the exposure bias). In addition, the idea of learning through parallel observation and relevance models such that click is the "and" operation of them is well known in debiasing. However, I don't remember seeing attention-based models with the same input used for this task - if this is the main novelty of the paper I suggest emphasizing this point. We acknowledge that simunateously addressing the position and popularity biases has been investigated in previous papers such as [1]. We want to clarify that our solution significantly differs by consolidating the influence of position and popularity biases into a unified factor. Another key innovation in our work lies in leveraging an attention-based two-tower architecture to autonomously disentangle user-item features pertaining to relevance and observation factors. Our model supervises this disentanglement process by introducing a regularization term that emphasizes the conditional independence between observation and relevance factors. This approach represents a departure from prior methodologies and stands as a distinctive contribution in our study. Furthermore, it's important to highlight that the two-tower architecture we employ is widely adopted in real-world platforms such as [2][3], as summarized in [4]. This characteristic renders our model easily deployable in real-world use cases, alleviating concerns about heavy computational loads and facilitating practical implementation (as discussed in Appendix D). [1] AutoDebias: Learning to Debias for Recommendation. [2] DeepFM: A Factorization-Machine based Neural Network for CTR Prediction. [3] Deep Interest Network for Click-Through Rate Prediction. [4] IntTower: the Next Generation of Two-Tower Model for Pre-Ranking System. Response to reviewer gf6H Official CommentAuthors (Jiarui Jin, Weinan Zhang, Jun Wang, Julian McAuley, +3 more)15 Dec 2023, 03:40 (modified: 15 Dec 2023, 03:52)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, AuthorsRevisions Comment: Experiment: The paper demonstrates the power of its method on syntactic data using several observation models, and shows meaningful results. I find the use of several observation models important to convince that in a real-life scenario the method will also work well. However, and I know it is challenging to perform, there is no replacement for an online experiment in debiasing. Finally, I find the baselines relatively weak, as no baseline method from the last 4 years was used. Since I view the main novelty as attention-based architecture, I believe it should be compared to recent works. Thanks for your suggestion. We are actively seeking for an opportunity to assess the efficacy of our proposed method within an online platform. We agree that incorporating recent papers as additional baseline methods would heavily make our paper solid. Therefore, we have included three recent papers [5][6][7] as baselines into our paper. Due to the time limitation, we report the results with the setting of Yahoo (UBM), LETOR (UBM), and Adressa (UBM), as introduced in Table 2, as follows. The following table summarizes the results on Yahoo (UBM). Ranker Debiasing Method MAP N@3 N@5 N@10 InfoRanker InfoRank 0.845 0.736 0.739 0.779 InfoRanker UPE 0.844 0.721 0.710 0.750 InfoRanker Vectorization 0.841 0.698 0.701 0.744 LambdaMART PRS 0.838 0.717 0.727 0.760 DNN InfoRank 0.828 0.683 0.696 0.734 DNN UPE 0.823 0.680 0.697 0.732 DNN Vectorization 0.824 0.682 0.694 0.729 The results of LETOR (UBM) are reported as follows. Ranker Debiasing Method MAP N@3 N@5 N@10 InfoRanker InfoRank 0.650 0.380 0.460 0.541 InfoRanker UPE 0.642 0.378 0.449 0.536 InfoRanker Vectorization 0.640 0.377 0.444 0.534 LambdaMART PRS 0.633 0.367 0.422 0.509 DNN InfoRank 0.637 0.360 0.416 0.499 DNN UPE 0.629 0.356 0.416 0.495 DNN Vectorization 0.628 0.357 0.412 0.492 We also report the results of Adress (UBM) as follows. . Ranker Debiasing Method MAP N@3 N@5 N@10 InfoRanker InfoRank 0.801 0.691 0.715 0.739 InfoRanker UPE 0.794 0.677 0.705 0.728 InfoRanker Vectorization 0.795 0.678 0.703 0.729 LambdaMART PRS 0.796 0.671 0.714 0.734 DNN InfoRank 0.786 0.667 0.692 0.725 DNN UPE 0.782 0.660 0.681 0.719 DNN Vectorization 0.786 0.665 0.687 0.721 Here, Vectorization is proposed in [5] and UPE is proposed in [7], where we combine [5] and [7] with both DNN ranker and InfoRank ranker. PRS is proposed in [6], where we only combine it with LambdaMART, as it is originially designed for pair-wise ranker. For ease of comparisons, we also include the results of our InfoRank into the tables. These results further verfy the superiority of InfoRank. We will complete Tables 2 and 3 with the above new baselines in our revision. [5] Scalar is Not Enough: Vectorization-based Unbiased Learning to Rank. 2022. [6] Unbiased Learning to Rank via Propensity Ratio Scoring. 2020. [7] Unconfounded Propensity Estimation for Unbiased Ranking. 2023.