#########################

SUBMISSION: 100
TITLE: Locality-Sensitive State-Guided Experience Replay Optimization for Sparse-Reward in Online Recommendation

-------------------------  METAREVIEW  ------------------------
All reviewers reach consensus that this work has investigated an important problem and the techniques are of high originality. Meanwhile, reviewers also pointed out several places to be enhanced, making the paper stronger, such as comparison with non-learning based methods and more background knowledge presentation. To our best knowledge these are not difficult to achieve. We hope the authors could take a look at those and address them in the Camera Ready.


----------------------- REVIEW 1 ---------------------
SUBMISSION: 100
TITLE: Locality-Sensitive State-Guided Experience Replay Optimization for Sparse-Reward in Online Recommendation
AUTHORS: Xiaocong Chen, Lina Yao, Julian Mcauley, Weili Guan, Xiaojun Chang and Xianzhi Wang

----------- Relevance to SIGIR -----------
SCORE: 4 (good)
----------- Technical soundness -----------
SCORE: 4 (good)
----------- Quality of presentation -----------
SCORE: 4 (good)
----------- Adequacy of citations -----------
SCORE: 4 (good)
----------- Reproducibility of methods -----------
SCORE: 4 (good)
----------- Strengths -----------
1. It provides a new direction about solving sparsity in reinforcement learning based recommender system which is not investigated before.
2. Rigorous proof provide certain theoretical guarantee for the proposed method.
3. It is well-written and easy to follow. The motivation is convincing and extensive experiments prove the superiority of the proposed method.
----------- Weaknesses -----------
1. The paper requires readers to have some basic background knowledge of the reinforcement learning such as experience replay and its drawbacks. Authors may introduce readers gradually into the reinforcement learning world by starting from a generic non-reinforcement scenario or problem.

2. It would be hard for those researchers who has no background in reinforcement learning based recommender system to understand the experiments as it is totally different from traditional recommendation evaluation process.
----------- Overall recommendation -----------
SCORE: 2 (accept)
----------- Detailed comments to authors -----------
In this paper, authors target at the reward sparsity problem, which is well-known and generally recognized as a remaining challenge in existing research efforts on reinforcement learning research. And recommender systems represent a typical, if not the most representative application where the challenge is even more urgent and evident. Basically, this works focuses on the reinforcement learning paradigm which can cope with the users’ dynamic interest. And this work proposes a new experience replay method to overcome the sparse reward challenge, which cannot be well addressed by existing reinforcement learning. The experiments show that the proposed LSER can significantly relieve the adverse impact of such a problem in large simulation platforms like VirtualTB. And the theoretical analysis supported by rigorous proof guarantees the effectiveness and the performance bounds of the proposed method.


Some drawbacks of this paper are listed below. Firstly, from the model side, the authors have not provided a layman-friendly introduction; and the research topic of reinforcement learning based methods would require reader to have enough background in the relevant areas such as experience replay and off-policy algorithm to benefit from the results of this paper. Secondly, from the experiment side, due to the particularity of the reinforcement learning, the evaluation process is different from traditional algorithms which may make readers confusing if they are not familiar with this area.
----------- Nominate for Best Paper -----------
SELECTION: no


----------------------- REVIEW 2 ---------------------
SUBMISSION: 100
TITLE: Locality-Sensitive State-Guided Experience Replay Optimization for Sparse-Reward in Online Recommendation
AUTHORS: Xiaocong Chen, Lina Yao, Julian Mcauley, Weili Guan, Xiaojun Chang and Xianzhi Wang

----------- Relevance to SIGIR -----------
SCORE: 5 (excellent)
----------- Technical soundness -----------
SCORE: 4 (good)
----------- Quality of presentation -----------
SCORE: 4 (good)
----------- Adequacy of citations -----------
SCORE: 4 (good)
----------- Reproducibility of methods -----------
SCORE: 5 (excellent)
----------- Strengths -----------
1. The paper is well-motivated, well-structured, and easy to follow.

2. The mathematic proof provides the theoretical guarantee of the proposed method regards to the potential information loss caused by dimension deduction.

3. It provides extensive experiments in three online simulation platforms and shows the superiority of the proposed method.
----------- Weaknesses -----------
1. Although the proposed method is reinforcement learning based and the whole paper focuses on the reinforcement learning area, it would provide more insights if the authors conduct comparative experiments with some non-reinforcement learning based methods.

2. It would be good if authors also consider comparing the LSH with other dimension reduction methods.
----------- Overall recommendation -----------
SCORE: 2 (accept)
----------- Detailed comments to authors -----------
1. Although the proposed method is reinforcement learning based and the whole paper focuses on the reinforcement learning area, it would provide more insights if the authors conduct comparative experiments with some non-reinforcement learning based methods.

2. It would be good if authors also consider comparing the LSH with other dimension reduction methods.
—

This paper focuses on deep reinforcement learning based recommender systems. And the authors aim to address the suffering of existing methods from the sparse reward situation. More specifically, they propose in this paper a new method for experience replay for deep reinforcement learning based recommender systems. Overall, this paper provides a new aspect of improving the efficiency of deep reinforcement learning based recommender systems and is worthy of investigation.

The design and application of hashing and locality-sensitiveness for the experience replay optimization appear to be a sound direction to go to address to reward sparseness in online recommendation scenarios.

Regarding reproducibility, the key component’s pseudo code is provided, and all the evaluation platforms are available online. The audience should find it relatively easy to reproduce the results.

Figure 1 presents an overview of the proposed LSER. However, it is not often referred to in the paper. It would be a good idea for the authors to refer to this figure when presenting and explaining the relevant concepts and the connections of these concepts in the paper.

Some of the formulas are not numbered. I am not sure whether there is a particular reason for this. But consistency would be preferred.

Algorithm 1 is referred to on page 4 but presented on page 5. This algorithm should be presented where it is introduced, if it is possible. In addition, there is a lack of detailed discussion of the workflow of this algorithm.

Figure 5 is presented on page 9 while Section 3 ends on page 8. Would it be possible to move Figure 5 to somewhere close to Section 3, preferably within Section 3?
----------- Nominate for Best Paper -----------
SELECTION: no


----------------------- REVIEW 3 ---------------------
SUBMISSION: 100
TITLE: Locality-Sensitive State-Guided Experience Replay Optimization for Sparse-Reward in Online Recommendation
AUTHORS: Xiaocong Chen, Lina Yao, Julian Mcauley, Weili Guan, Xiaojun Chang and Xianzhi Wang

----------- Relevance to SIGIR -----------
SCORE: 5 (excellent)
----------- Technical soundness -----------
SCORE: 4 (good)
----------- Quality of presentation -----------
SCORE: 4 (good)
----------- Adequacy of citations -----------
SCORE: 5 (excellent)
----------- Reproducibility of methods -----------
SCORE: 4 (good)
----------- Strengths -----------
1. The paper is clearly and sufficiently motivated, and deals with an important problem.
2. The paper solves the problem from several aspects and puts them together to form an integral solution. The theoretic proof is a plus to the illustration of the proposed method to address the targeted problem.
3. The extensive experiments show that the proposed LSER can outperform than other existing ER methods in deep reinforcement learning based recommender system.
----------- Weaknesses -----------
1. A visual comparison of the proposed approach and previous approaches would help highly the key differences and novel contributions.
2. I would expect to see the performance of the proposed approach on the real-world application.
----------- Overall recommendation -----------
SCORE: 2 (accept)
----------- Detailed comments to authors -----------
This paper is about the deep reinforcement learning in recommender system. It proposes a new experience replay method named LSER to help the agent select the experience to store and replay. To validate the proposed method, the authors also provide the proof of the boundary to improve the soundness.
This paper is well-written and self-contained. Besides, the provided pseudo code is straightforward and easy to understand which improves the reproducibility.
While the work appears solid in experiments, it is necessary to justify the selection of compared methods.
Another suggestion is to introduce more background of reinforcement learning to help readers to understand some key concept of in this paper.
There is an extra word “all” in the right column on page 7. Authors should proofread the paper and eliminate it and any other case alike.
----------- Nominate for Best Paper -----------
SELECTION: no