Reviewer 1. Comment 1. However, there exist few pieces of work that use contextual bandits or related open libraries on the recommender system, which are missed by this survey paper. For instance, ● Li, L., Chu, W., Langford, J., & Schapire, R. E. A contextual-bandit approach to personalized news article recommendation. ● Cortes, David. "Adapting multi-armed bandits policies to contextual bandits scenarios." arXiv preprint arXiv:1811.04383(2018). Response: We appreciate reviewer’s comment. We thank reviewer for bringing up about the bandit-based method. As the bandit-based methods are the main-stream before the DQN is first introduced, we may not be able to cover this family in our main body. We have added one sentence on the introduction section to descript the bandit methods and its limitations. Reviewer 3. Comment 1. There are some typos in the manuscript. For example, in Eq.(41), I think the A(s) inside the clip function should be omitted. Response: We appreciate reviewer’s comment. Yes, the A(s) is duplicate, and we have removed it on our revised version. Comment 2. Page 31: '' handled similarly---s underthe'', need to be fixed. Response: We appreciate reviewer’s comment. We have fixed this typo. The revised sentence is highlighted as red on the revised manuscript. Comment 3. The claim of MuJoCo may not be very accurate. As for now, it has been open-sourced, and more environments are added. The number of actions may exceed 100. While it is still much less than RS. But it would be grateful to make it clear. Response: We appreciate reviewer’s comment. While the MuJoCo now may has a greater number of actions, but it is still unlikely to reach the amount that we are facing on RS. Hence, we have removed this footnote to avoid the ambiguous. Comment 4.. The L_{DRL} on Eq.(37), can you specify which kind of loss function can be used here? As it contains three different approaches and each has different loss functions. Response: We appreciate reviewer’s comment. We have added one sentence to explain how to choose the L_{DRL} on Page 32 and highlighted as red.