Reviews of 1489 Item Recommendation on Monotonic Behavior Chains Reviewer 4 (chair) Expertise Knowledgeable The Meta-Review Reviewers all agree about the relevance to RecSys and the interest in looking into sequence of actions to recommend items. They also agreed the problem and solution are well presented by the authors. Some concerns were raised regarding the novelty. I concur with R1 that other studies of action sequences were conducted in close domains such as web search and mobile app use and download and the authors should have reflected on these lines of work. Also, some comments were brought up regarding result analysis (e.g., popularity versus stronger baseline, dataset bias). After discussion, there was an agreement the paper poses enough contribution to warrant publication. Please consider making small revisions according to reviewer's comments. Final Recommendation after Meta-Review Probably accept: I would argue for accepting this paper. Reviewer 1 (PC) Review Rating Probably accept: I would argue for accepting this paper. Expertise Passing Knowledge Contribution The paper goal is to improve the current item recommendation frameworks by adding additional dimension for the chains of the consecutive user actions. Furthermore, the authors propose optimization criteria that uses both monotonic scoring function and probabilistic constrains for the user actions (“stages”). The proposed approach overperforms to some extent the state-of-the-art/baseline techniques on 4 public datasets. Relevance to RecSys The paper is of the direct relevance to RecSys. Your Review The paper is well written and easy to follow. I am not sure how novel is the idea to apply the sequential actions into the item recommendation systems. However, there have been a number of works using the sequences of the events for the next action prediction. See, e.g. Takeshi Kurashima, Tim Althoff, and Jure Leskovec. 2018. Modeling Interdependent and Periodic Real-World Action Sequences. In Proceedings of the 2018 World Wide Web Conference (WWW '18). The main limitations of this work I find are as follows: (1) The item recommendation is only based on user action knowledge and it is not clear how other features for both users/items can be incorporated into the model. (2) The number of stages in behavioral chains is limited (in the examples, it is never more than 5). That is fine, however, it would be nice to discuss how it might change the performance of the model. Some other questions: (3) Although, the authors discuss the sparsity of their matrix, I wonder how would it work in the ad-tech case with imp->click->conversion is almost never happening. (4) Finally, it would be nice to have a bit more explanation on how graphs in Section 5.4 are built and how different it would it be if one takes into account only the stage slices. Reviewer 2 (PC) Review Rating Probably accept: I would argue for accepting this paper. Expertise Passing Knowledge Contribution This paper proposes a recommendation technique, called ChainRec, that is capable of exploiting different types of user interactions with a system. These interactions can be either explicit or implicit and can be of different forms such as user purchase, review, click, rate, etc. The authors present an observation, called as Monotonic Dependency: the presence of a more explicit interaction (stronger signal) implies the presence of a more implicit interaction (weaker signal). The proposed recommender technique has been evaluated by using 5 different large datasets (i.e., Steam, YooChoose, Yelp, GoogleLocal, and GoodRead). The results has shown superior performance of the proposed technique over almost the entire set of datasets. Relevance to RecSys The paper is relevant to RecSys community as it is related to the following topics: - Novel machine learning approaches to recommendation algorithms - User modelling Your Review The paper has been written nicely and it is easy to read. The related work section can be extended by including some more recent related papers such as: - Gurbanov, T., & Ricci, F. (2017, April). Action prediction models for recommender systems based on collaborative filtering and sequence mining hybridization. In Proceedings of the Symposium on Applied Computing (pp. 1655-1661). ACM. the authors may briefly comment on the excellent performance of a simple itemPop (in table3) which, in some cases, outperforms the strong baselines of group (b). The authors may consider also the XING dataset of RecSys challenge 2017, see: - Abel, Fabian, et al. "Recsys challenge 2017: Offline and online evaluation." Proceedings of the Eleventh ACM Conference on Recommender Systems. ACM, 2017. The authors may note that these datasets are collected by web applications where an operational recommender system suggests items to users and hence the users mainly interact with these suggested items. This can strongly bias the collected data towards recommendations (recommender system --> suggestions to users --> interactions of the users --> collected dataset). Hence the recorded interactions may be substantially influenced depending on the type of recommender system implemented. In the best case, the proposed technique (ChainRec) can predict what has been predicted and recommended by the recommender system running on these web applications. I think this should be discussed in the paper. why the results of figure 4 are very different. For instance, the performance of logMF substantially changes in each of subfigures (datasets). The number of embedding dimensions (K) has been set to 16. However, according to the sensitive analysis (figure 5), a number about 25-35 looks a better option. figure 6 is very interesting and can be enlarged a bit for a better visualisation. In figure 7, despite the differences being statistically significant, the distributions look similar. I would suggest the authors to consider other type of evaluation metrics such as novelty and diversity. * Positives - proposed model capable of using different user interactions - evaluation on several different datasets and various baselines * Negatives - some inconsistencies in the results * Suggested Revisions - representation of results can be improved - related work section can be extended Reviewer 3 (PC) Review Rating Borderline: Overall I would not argue for accepting this paper. Expertise Knowledgeable Contribution In this paper the authors present a representation of aligning multiple types of user interactions using monotonic behavior chain and propose a recommendation model based on this. They also devise an algorithm that exploits the monotonicity property and models different types of interactions as well as present a new optimization algorithm that can exploit the monotonicity property efficiently. The paper also presents a detailed experimental evaluation on five real-world data as well as contribute a new dataset. Relevance to RecSys This paper addresses explicit and implicit feedback in recommender systems and devise an algorithm to jointly model both types of interactions hence, the paper is relevant to the RecSys community. Your Review This paper makes some important contributions but the experimental methodology lacks scientific rigor. Strengths: • The paper is generally well-written and structured. • The authors have presented a novel observation, model with optimization, and presented experimentations using real-world data and have claimed to contribute a new dataset. Weaknesses: • It is not clear how common is the observed “click->purchase->review->recommend” interaction in real-world scenarios. In real world scenario, even the clicks (marked as dense) are quite sparse. • In the experimental result, it is not clear how the AUC-ROC is computed? What is the setup of the classification setup in the item ranking problem? Please explain • An AUC-ROC in the range of 0.96-0.98 on any real-world data seems exceedingly high to me. As mentioned above, it is not clear what is being measured here. • In the paper the authors claim significantly high results, but no confidence interval or error bars are provided. Minor issues/typos: * Page 1, col2: "We typically observe relatively a few explicit …" --> rephrase * The figures 1 and 2 are hard to follow, please provide more explanations * The authors claim to have introduced a new dataset, it should be shared publicly in order for the experiments to be reproducible