Reviewer 5 (Primary SPC (meta-reviewer)) The Meta-Review The paper studies recommendations in live streaming platforms and investigates repeated consumption vs. ephemeral availability. It proposes a sequence-based end-to-end architecture to recommend streamers that incorporates repeated consumption and ephemeral availability. Positives: - interesting and understudied problem - the paper does a good job in showing how streaming is at the intersection of repeated consumption and ephemeral availability - datasets will be released - interesting architecture is proposed Negatives: - parts of the paper are hard to follow (writing, order of presentation, unreferenced figured that are not always clear) - crucial details are missing on the experimental setup and the model - missing related work - the analysis of the results lacks empirical evidence and contains lots of descriptive observations (section 6.5) - the paper contains several typos and formatting errors - source code not available (but the authors address this in the rebuttal and claim that they will release it after acceptance) - other domains having similar problems of repeated consumption are not cited Overall, these negatives outweigh the positives too heavily for the paper to be accepted at RecSys, even though all reviewers see merit in the paper. The authors are encouraged to use the detailed reviews to improve and revise their paper. Final Recommendation after Meta-Review Reject: I do not think this paper should be accepted Reviewer 4 (Secondary SPC (SPC reviewer)) Expertise Expert Your Review This is a suggested guide for the format of your review; note that this is merely a suggestion, feel free to delete sections that are not relevant to your review. ================================================== (1) Summarize in a few sentences the contribution, and its relevance to recommender systems: The paper describes a model ranking streamers for users in a dynamic setting of live-streaming platform in real-time based on historical interactions and current availability. The model considers also repeat consumption of streamers. A dataset from Twiאch was collected and will be release to the community. ================================================== (2) List (2 or 3) of the main positive points about the paper, i.e., arguments for acceptance: 1. The problem is interesting, a special domain with very few studies that deal with this challenge 2. Release of a large dataset for the research community ================================================== (3) List (2 or 3) of the main negative points about this paper, i.e., arguments for rejection: 1. written very not clear. Very hard to follow. Order of presentation would not make sense. Figures not always referenced in text, or referenced in one minimal sentence. and very difficult to otherwise to understand what they show (e.g. fig 4) . ================================================== (4) Detailed review: The authors explain in the introduction: what is special about their setting: 1) the evolving items, and specifically the idea that they need to identify availability of items (streamers ) when evaluated; and then the idea of repeated consumption. They say that use may want to watch content of the same streamer again. This does not not look different from consuming movies of the same director, or even consuming the same music items again, or consuming music of the same singers. No relevant related works were mentioned in these field. The idea of dynamic content is relevant in other domains , such as social media or news items, but no related work or explanation of how this domain is different is described. Why is it necessarily a sequence recommendation method? iif streamers are recommended? Temporal dynamics is chosen as another related works subject, the paragraph describes temporal patterns , but does not say how it is relevant here and talk about repeat consumption- but not how these domains are connected? they say " They propose a hybrid model that predicts user choice based on a combination of recency and quality. In this study, we consider repeat consumption of the same channel broadcasting new content, which differs from what past work has considered." I understand the first part of combining some temporal effect, but not how the second -is related - to it. why the idea of combining new and old content is on the same sentence?? the whole section is not clear, also the next paragraph about recommender systems that is a mix of many methods - sequential, self- attention, entity lining? I could not understand where the paper is positioned in the literature. I could not understand why the readers were referred to figure 3 - (not explained ), I did not understand what is presented there (what is p?) and right after that to figure 11 (for the daily and weekly dynamics) (4c) Typos, formatting issues, etc. The structure is very hard to follow. The authors present experiments and results and then ask questions about various factors , but the answers are not clearly found. (4d) Appropriateness of paper length In spite of the writing issues. The work is interesting, a special domain and a lot of experiments with good results, but very hard to read. The authors should work on the writing ================================================== (5) Any further details about your specific expertise or perspective on this topic: Human Participants Research Does not report on human-participants research Ethical considerations (blank) Review Rating Borderline: Overall I would not argue for accepting this paper. Reviewer 1 (PC) Expertise Knowledgeable Your Review This is a suggested guide for the format of your review; note that this is merely a suggestion, feel free to delete sections that are not relevant to your review. ================================================== (1) Summarize in a few sentences the contribution, and its relevance to recommender systems: This paper proposed a model for live-streaming item recommendations, taking into account the importance of user repeat consumptions and temporal dynamics. The method is evaluated on a novel dataset collected by the authors on the live streaming service Twitch. The proposed model is shown to outperform baselines in this dataset. ================================================== (2) List (2 or 3) of the main positive points about the paper, i.e., arguments for acceptance: 1. This paper is generally well-written, with good motivations and clear descriptions of model and analysis. 2. Extensive experiments are conducted, with sufficient preliminary experiments, model performance evaluation, and analysis. 3. The dataset collected by the authors may be a good contribution to future research on the streaming recommendation. ================================================== (3) List (2 or 3) of the main negative points about this paper, i.e., arguments for rejection: 1. Some important details and clarifications are missing: for example how the prediction is done? The authors only mentioned that training is done through negative sampling. More details such as how h_{t-1} is used to obtain the prediction are needed. For another example, the section of evaluation needs more clarification: how the padding is achieved? It seems to have typos in the sentence "we append interactions from the training and the training set". Furthermore, it says "A testing interaction falls into rep if the item appears in the testing input sequence", is it supposed to be training instead of "testing input sequence"?. 2. Some important related work is missing which could potentially be used as baseline approaches. I named a few recommendation papers that consider temporal dynamics and repeated consumptions: Streaming Recommender Systems by Chang et al. Time Matters: Sequential Recommendation with Complex Temporal Information by Ye et al. Déjà vu: A Contextualized Temporal Attention Mechanism for Sequential Recommendation by Wu et al. Furthermore, any reason that [1] can not be used as a baseline for performance comparison? It seems to be very related to the proposed approach. ================================================== (4) Detailed review: In this section, discuss your overall opinion about the merits of the work. Consider issues of replicability, significance, value to the community, novelty, etc. You may want to include additional sections on: (4a) Suggested revisions 1. It may be better to explain the variants before discussing their performances. 2. By comparing LiveRec + av with LiveRec + rep + av, it seems like adding rep component compromised model's ability on recommending new items (worse H@1-new in LiveRec + rep + av. And the same for H@10). I am wondering, in other datasets where new items are more than repeated ones, if adding rep component will lead to worse performance in recommendation in general. Some discussion on this may be helpful. 3. This paper fixed the dimensionality of all attention-based models to be 128. It's worth searching on a bigger range for this hyperparameter since it is important for attention-based methods. (4b) Missing related work Please see the second comment in (3) (4c) Typos, formatting issues, etc. please see the first comment in (3) (4d) Appropriateness of paper length ================================================== (5) Any further details about your specific expertise or perspective on this topic: Human Participants Research Human participants research but would not require review Ethical considerations (blank) Review Rating Borderline: Overall I would not argue for accepting this paper. Reviewer 2 (PC) Expertise Knowledgeable Your Review This is a suggested guide for the format of your review; note that this is merely a suggestion, feel free to delete sections that are not relevant to your review. ================================================== (1) Summarize in a few sentences the contribution, and its relevance to recommender systems: The paper studies a really interesting and relevant topic in recommender systems, that of recommendations in live streaming platforms. In this scenario, the traditional definition of items is no longer viable (streams are ephemeral by definition) and would lead to an extreme sparsity problem. On the other hand, streamers can considered as items. This opens new interesting challenges due 1) repeated consumption (users repeatedly watch new content from the streamers they like the most) and 2) availability (not all streamers are live at the same time, negative feedback can be due both to lack of relevance or availability). The paper analyses these two challenges in depth, and propose an sequence-based end-to-end architecture for the recommendation of streamers to users with dedicated components to model availability and repetition. ================================================== (2) List (2 or 3) of the main positive points about the paper, i.e., arguments for acceptance: * The paper is overall well written. The reader is gently introduced through the challenges induced by repeated consumption and availability through preliminary experiments, which highlight the importance of the negative sampling strategies in this domain. * The proposed architecture introduces an interesting mix between sequence encoding through Transformers, availability modeling through self-attention and temporal embeddings to account for the distance between repeated consumption of items. * The experimentation seems quite robust. The authors went through a process to collect a large dataset from Twitch using the public API, and will release the datasets to help other researchers. ================================================== (3) List (2 or 3) of the main negative points about this paper, i.e., arguments for rejection: * While releasing the dataset is definitely appreciated, in order to have complete reproducibility, the authors should release the source code of their experiments as well. ================================================== (4) Detailed review: In this section, discuss your overall opinion about the merits of the work. Consider issues of replicability, significance, value to the community, novelty, etc. You may want to include additional sections on: (4a) Suggested revisions (4b) Missing related work (4c) Typos, formatting issues, etc. (4d) Appropriateness of paper length (4a) Suggested revisions In the experimentation: * I would describe the meaning of labels (rep) and (av) before Table 1, rather than having the reader to go through section 6.4 to understand their meaning. * In Sec 6.4, what metric is considered to compute relative improvement between SASREC and LiveRec? * In Sec 6.5, I found the insights of Question 4 and 5 hard to interpret. For example, how do the temporal embeddings <12h differ from the others? What kind of dynamics are they capturing? What are the visible patters for intervals less than 1 week? Analogously, the analysis of the query-key weights in Q5 is hard to interpret. (4b) Missing related work Nothing to highlight. (4c) Typos, formatting issues, etc. (4d) Appropriateness of paper length The paper length seems totally right. ================================================== (5) Any further details about your specific expertise or perspective on this topic: Human Participants Research Does not report on human-participants research Ethical considerations (blank) Review Rating Probably accept: I would argue for accepting this paper. Reviewer 3 (PC) Expertise Knowledgeable Your Review (1) Summarize in a few sentences the contribution, and its relevance to recommender systems: The research focuses on recommending live-streaming content, bearing in mind such peculiar properties as availability and repeat consumption.Authors validate their model on the Twitch data set that they are planning to release with this paper. ================================================== (2) List (2 or 3) of the main positive points about the paper: The paper is relevant to RecSys and proposes LiveRec - a model based on SASRec that recommends available at this time and relevant according to the previous history item, that is a streamer. The method is compared to a few baselines and is shown to outperform them according to ranking metrics. ================================================== (3) List (2 or 3) of the main negative points about this paper: Authors of the paper are planning to release two data sets: one is a benchmark and another one is a full one, though they performed their experiments only on the smaller one. Meaning that it is not clear how their method works with this bigger dataset. ================================================== (4) Detailed review: The paper discusses an interesting and relevant to RecSys problem of live-stream recommendations. Temporal availability should be considered, as well as repetitions. The model was validated on the Twitch data set that they created and compared to a few baseline models (POP, MF-BPR, FPMC, SASRec, Ber4Rec) using ranking related metrics, such as Hit@1, Hit@10 and NDCG@10. And showed that their proposed approach outperforms these baselines. I liked an analysis part where the authors tried to analyze their model by providing Q&A section. Though I feel that sequence statistics on the data set is missing here. Also I think that streamers' dynamic depends on their time zone and age as well and maybe differs on weekends and holidays (relative to weekdays), however it depends on the features that exist in the data set and I'm not sure such data is provided. This paper can be accepted as a short paper for the conference, considering suggested improvements below. In my opinion for the sake of the reader it would be better to mention that SASRec shifts sequence to predict the next item, this way it is easier to understand sequences split every 24 hours, where each time the next item at T+1 is predicted. It would be also good to add a snapshot of the data set - how few first rows of the data look like. As far as I understood from the data description, the data was sampled every 10 minutes (intervals between rounds). However, I still do not exactly understand what is the meaning of the total number of time steps of the data set. Is it num_of_rounds x 10 minutes?Also in the methods part when authors talk about fixed-length l, they mean 24 hours, don't they? I think the description is too general, more SASRec, but it should be adapted to their model. Figure 12: color explanation will help readers to understand it better, as well as in Figure 15. Page 4: it should be Figure 5 instead of Figure 11. Authors should be also consistent in figures referring, since sometimes they write Figure and sometimes fig. I was wondering why authors don't share their code together with the dataset for better reproducibility. Human Participants Research Does not report on human-participants research Ethical considerations (blank) Review Rating Probably accept: I would argue for accepting this paper.