#################################################################################################### CIKM (accept) #################################################################################################### Masked Reviewer ID: Assigned_Reviewer_1 Review: Question Is the submission relevant to the KM track? Yes What do you think of the ideas in the submission? Incremental Is the writing clear? Yes Overall recommendation -2: Marginal How to improve the submission (for the camera-ready, if accepted or for resubmission to another venue, if rejected)? List your three suggestions. Please take a look at the detailed review. 1. All of the proposed approaches use K* times more parameters than the baselines. In order to make a fair comparison, please use the same number of parameters for both the baseline and the proposed approaches. 2. The technical novelty in the paper is fairly limited. AUC loss seems to be a direct extension of BPR (but we didn't see how it performed in the experiments). The only bit that is new and interesting is the KL divergence idea. Detailed review The authors extend the traditional latent factor model by keep around a matrix (which they call as a projection matrix) for each user (instead of a user factor). The motivation is that it lets the model capture user preferences along multiple dimensions. Now the affinity between a user and item can be expressed a vector (multiplying the user projection matrix with the item factor). The authors extend the previously proposed ranking objectives (which compared 2 scalars) to now compare these vectors. They propose 3 interesting ranking based metrics, one based on KL divergence between the vectors, the others extensions of Steffen Rendle's BPR algorithm. The authors empirically show the benefits of using the user matrix on several real datasets. The KL divergence critera was empirically shown to be the best. Good: 1. The idea of using a matrix to represent the user factor is nice. However, the most novel aspect here is to develop the ranking metrics to compare the vectors. I especially like the idea of using KL divergence here. 2. The paper is well written and a pleasure to read. Weak: 1. All of the proposed approaches use K* times more parameters than the baselines. In order to make a fair comparison, please use the same number of parameters for both the baseline and the proposed approaches. 2. The technical novelty in the paper is fairly limited. AUC loss seems to be a direct extension of BPR (but we didn't see how it performed in the experiments). The only bit that is new and interesting is the KL divergence idea. Details: 1. KL divergence is asymmetric. Does this cause any problem ? Why not use any of the symmetric versions of the KL divergence ? 2. Why are the numbers missing for the PFP AUC loss, is it worse than the BPR strategy ? It would be interesting to compare the 2 methods, especially since PFP can be thought of as a direct generalization with more parameters. 3. How would you suggest that the parameter K* be chosen ? Would this be based on the number of positive feedbacks per user ? 4. Section 2.2: bayesian -> Bayesian 5. The PFP methods outperform the others in terms of AUC, but they are actually worse off, in terms of nDCG/precision metrics. Why do you think that is ? 6. Is AUC even a reasonable metric for a real world problem ? It seems that Precision at K with a fairly small K should be more useful. In that regard, PFP is actually worse off than the baseline algorithms. Masked Reviewer ID: Assigned_Reviewer_2 Review: Question Is the submission relevant to the KM track? Yes What do you think of the ideas in the submission? Some novelty Is the writing clear? Yes Overall recommendation +3: Should accept How to improve the submission (for the camera-ready, if accepted or for resubmission to another venue, if rejected)? List your three suggestions. See detailed feedback. Detailed review The authors model one-class recommendation problem by assigning a K* dimensional preference to each user-item pair, instead of the traditional single numeric score. The preference vector is derived through latent factors U and V for users and items, along with a personal-projection-matrix P^u. The authors go on to define several optimization criteria and give efficient algorithms to solve for them. The experiments done on real life datasets show pretty significant lifts by using the new scheme. I like the paper overall, it is very well written and easy to follow. The authors also give intuitive explanation that help understand the concepts: e.g. the discussion at the end of section 3.1 is very helpful in understanding the costs and \hat{v_j} vectors. A few improvements: - There is a typo in equation(2), IMO there should be \sum{u,i,j} before the max Masked Reviewer ID: Assigned_Reviewer_3 Review: Question Is the submission relevant to the KM track? Yes What do you think of the ideas in the submission? Incremental Is the writing clear? Yes Overall recommendation -2: Marginal How to improve the submission (for the camera-ready, if accepted or for resubmission to another venue, if rejected)? List your three suggestions. 1, explain the choices of three loss functions. Detailed review In this paper, the authors propose a personalized feature projection method for one-class recommendation problem. The personalized feature projection method learns users' projection matrices and items' factors to make recommendations. The authors also propose three loss functions for the one-class setting. Experiments on four data sets show the effectivness of the proposed models. Following are some major concerns: 1, The personalized feature projection method is interesting. However, it has already been proposed by MaxMF [39]; the authors mainly propose a different way to make use of the projected features. 2, The authors propose three loss functions or evaluation criteria, but some explantions about which one is suitable for which situation would be better. #################################################################################################### SIGIR (reject) #################################################################################################### ------------- Review from Reviewer 1 ------------- Relevance to SIGIR (1-5, accept threshold=3) : 5 Originality of Work (1-5, accept threshold=3) : 4 Technical Soundness (1-5, accept threshold=3) : 2 Quality of Presentation (1-5, accept threshold=3) : 4 Impact of Ideas or Results (1-5, accept threshold=3) : 3 Adequacy of Citations (1-5, accept threshold=3) : 4 Reproducibility of Methods (1-5, accept threshold=3) : 4 Overall Recommendation (1-6) : 3 -- Comments to the author(s): --- Preamble --- This is the meta-review by the Primary PCM responsible for your paper, and takes into account the opinions expressed by the referees, the subsequent decision thread, and my own opinions about your work. --- Synthesis --- The reviewers all agreed that the paper was well written and that the initial idea, of using a more complex relationship between users and items, is novel and interesting. While there was some discussion on the experimental part, most of the discussion and critics focused on the model itself as described by equation (4) which reduces to (almost) standard matrix factorization techniques as noted by reviewer #1. It is thus unclear where the difference of the proposed approach with baselines come from : is it from this particular form, or because of the losses used? --- Additional comments --- I think that, to improve the paper, the experiments and the analysis should focus on showing why representing the user by a matrix increases the results. Reading the paper, I also wondered why no validation set was used to determine the best K for the various methods, before comparing them. Also, since K=10 has a very different meaning in MaxMF and in your approach, with 10 free parameters for MaxMF versus 100 for your approach when representing the user, it seems unfair to compare both. -- Summary: --- Final disposition --- Given the concerns about the model that were not cleared with the experiments conducted in the paper, there was a consensus towards rejecting the paper, but with a clear encouragement to clarify and conduct further experiments to understand what is going on. ---------- End of Review from Reviewer 1 ---------- ------------- Review from Reviewer 2 ------------- Relevance to SIGIR (1-5, accept threshold=3) : 4 Originality of Work (1-5, accept threshold=3) : 4 Technical Soundness (1-5, accept threshold=3) : 2 Quality of Presentation (1-5, accept threshold=3) : 3 Impact of Ideas or Results (1-5, accept threshold=3) : 3 Adequacy of Citations (1-5, accept threshold=3) : 3 Reproducibility of Methods (1-5, accept threshold=3) : 3 Overall Recommendation (1-6) : 3 -- Comments to the author(s): As Secondary PCM I have reviewed this submission, the reviews, and the subsequent discussion, and I concur with the recommended decision. -- Summary: n/a ---------- End of Review from Reviewer 2 ---------- ------------- Review from Reviewer 3 ------------- Relevance to SIGIR (1-5, accept threshold=3) : 3 Originality of Work (1-5, accept threshold=3) : 3 Technical Soundness (1-5, accept threshold=3) : 2 Quality of Presentation (1-5, accept threshold=3) : 3 Impact of Ideas or Results (1-5, accept threshold=3) : 2 Adequacy of Citations (1-5, accept threshold=3) : 4 Reproducibility of Methods (1-5, accept threshold=3) : 3 Overall Recommendation (1-6) : 2 -- Comments to the author(s): This paper looks at the standard recommender systems problem. The authors extend the standard Matrix Factorisation (MF) model by estimating a "projection matrix" per user (rather than a standard user factor vector). They motivate this change to "capture complex relationships between items' properties and users' preferences". I found this paper totally confusing and not because it was badly written. In fact I thought the presentation of related work in Section 2 and model estimation under different loss functions in Section 3.2 were particularly well written. The reason I found it so confusing is that I didn't understand why the model was any different from the standard Matrix Factorisation model. How could introducing a "projection matrix" which performs a linear transformation of the item factor vector result in anything but a linear (i.e. inner-product based) ratings prediction model? The discussion of (and implied correspondence with) non-linear models in the Related Work section further fuelled this confusion. And I didn't understand why this "new model" could outperform the competitive approaches in the experiments. I don't see the point of estimating a projection matrix if the intention is to use the resulting linear model for prediction as given in Equation 4. I note that the equation can be rewritten as the standard inner product of user and item factor vectors, meaning that the prediction model is indeed exactly the same as a standard MF model. As for the claimed performance improvements, I just think that the different regularisation structure (resulting from estimation of more parameters in what is effectively an extremely overparameterised model) is being used for parameter estimation, and that is the only reason I can see why performance might improve. I think the idea that P^u(P^u)^T defines a "personalized metric used to measure the preference difference between items" is interesting. It would be nice to understand whether this matrix really does offer a means for determining a user's perception of distance in the item space. And I really like the suggested future work in Section 5: "Employing contextual information (such as text) in order to understand the meaning behind the projected latent factor" -- In fact I would go so far as to say that the current paper should include such an analysis in order to provide a more meaningful contribution. There is some inconsistency in the notation used in different parts of the paper. "N_u" is used to denote negative (non-clicked) items for user "u" in Section 3.2.1 and a regularisation constant in Section 3.2.3. "K" is used as the factor count in 3.1 and sample count in 3.2.2, which is slightly confusing for the complexity analysis in Section 3.3. Regarding the experiments - why are so few latent factors being estimated? Is it because the algorithm simply doesn't scale? Usually experiments with matrix factorisation models involve 50, 100 factors or more -- in this case experiments are limited to 1 to 10 factors. Comments/Questions: I think the title for Section 2.2 should be more specific given the title of Section 2.1. Many references in Section 2.3 are missing "et al." Equation 7 is missing the "ln" term before P. Headings for Sections 4, 4.1 and 4.1.1 have no text between them. I was puzzled by the comment in Section 4.1.1: "there are no model hyperparameters so a validation set is not required". The "projection constraint terms" and the "number of factors" are certainly hyperparameters IMO. In section 4.2 the authors mention “these two datasets” in the context of Table 2, but it is not clear which of the four they are referring to. Typos: Above Equation 19: "the constrains" => "the constraints" -- Summary: - Paper is well written, but the model doesn't make sense to me. - Experiments performed with very small number of factors and reason for performance improvements is not clear IMO. ---------- End of Review from Reviewer 3 ---------- ------------- Review from Reviewer 4 ------------- Relevance to SIGIR (1-5, accept threshold=3) : 4 Originality of Work (1-5, accept threshold=3) : 4 Technical Soundness (1-5, accept threshold=3) : 2 Quality of Presentation (1-5, accept threshold=3) : 4 Impact of Ideas or Results (1-5, accept threshold=3) : 4 Adequacy of Citations (1-5, accept threshold=3) : 4 Reproducibility of Methods (1-5, accept threshold=3) : 4 Overall Recommendation (1-6) : 4 -- Comments to the author(s): You get a bonus review; I reviewed this paper as well as my colleague. The numeric scores are based on my review; I did the review independently of my colleague who is an expert in the area. Skip to the [2] to see his review. [1] This work re-examines latent factor models, introducing a personalized feature projection matrix as a means of producing personalized recommendations. The authors show how the PFP can be trained against three successively complex metrics: AUC, WARP loss, and a generalized vector function. The authors experiment on four datasets to show the veracity of their methods. The authors propose a simple method of introducing a PFP to induce a personalized vector-based ranking, an improvement over the previous work on Weston et al. [39] which used only the maximal match between item and user vectors. This is a simple fix to the problem that introduces the ability to account for smaller-but-definite latent preferences. The authors apply this method towards one-class recommendation, where they assume previously unviewed items (negatively rated) should be weighted less than viewed items (positively rated). The authors also show that their vector-based technique reduces to known methods when the vector size is set to 1, and explain this well, such that even novices can follow the argumentation well. The main difficulty with the paper that I was unsure about was with the basic premise of the paper in introducing the PFP matrix. It's not clear that this method differs significantly from the original MF approach (Perhaps I'm missing something obvious?). If this is true, then the demonstrated improvements are largely just due to the loss functions -- and invalidates a considerable portion of the paper, as both AUC and WARP loss do not do well. The paper's contribution then reduces to the introduction of a single vector loss function. Due to this lack of understanding, I had to lower the technical soundness aspect of the paper. Another problem I had with the paper is in the evaluation. The experimental set-up just states that the training and testing was split but doesn't provide enough detail on how. You describe how it was split for each user u, but what about the rest of the dataset not specific to u? Again, something probably obvious. Your datasets I believe are largely not just views and non-views, but complete with ratings. On page 7, you claim that they do not have ratings. You need to be clear whether you post-processed the datasets to remove rating information to reduce it to a one-class problem. Discussion is also a little bit lacking; the paper ends abruptely without much summary of the method's points. Importantly, Figures 3 and 4 consume a lot of space but relatively little is involved in their discussion. The curves for each dataset and metric are not very uniform and a more detailed explantion (or at least hypothesis would serve the readership well). Otherwise, this is a well-written and argued paper with a simple method and useful results. Overall, I liked this paper, and would like to see it in the conference. Minor comments (X, Y gives page and approx line number, when applicable): Somehow the hyphenation sometimes looks odd with a single character dangling: On page 2, PF-P and differen-t and dataset-s. Somewhere you can discuss on the choice of uniform sampling. Would it be worthwhile to sample differently than uniformly, during different phases of the training (beginning, ending)? 4, 30 you can put together all of the separate references into a single block, as you do elsewhere. 5, 20 For WARP loss, it would be useful to assess how your training sampling varies over iterations (does it get more difficult to find a violating instance?). If this is answered in other previous work, you can ignore this comment or add an appropriate citation. 5, 50 It's good that you mentioned the part about KL divergence being asymmetric. You introduced other possible similarity measures, but why choose one that is inherently asymmetric if that is not a characteristic needed in your work? Perhaps adding a short sentence to justify would be helpful. 6, 1 For the datasets mentioned, it'd be helpful if you provided a citation to the papers that use each dataset. You can add it as a row in Table 2. 7 Fig 1. Please include PFP-AUC results. Useful to benchmark against other AUC optimizing methods, even when it is always bettered by PFP-WARP. You can also leave some vertical space or rule to more easily distinguish your methods vs. baselines. 7 Pretty hard to read Figure 2. Not legible since both the font and the trend are too small to see. Perhaps better to also standardize the order of your datasets, as they are different in Page 6 and 7. 7, 30 "PFP (KL)-)-" (remove extra paren) 8 Figures 3 and 4. A lot of repetitiveness in axes labels and subfigure captions. Perhaps get rid of some so that the figures can be larger. Also should note that your y-axes values are different per subfigure. 8 The exception for LThing in K* should be explained better. 8, 25 "we conclude-s-" 9 Tables 4-7 could be better presented with relative amounts (e.g., +0.02). Why do you only have AUC results and not nDCG? 10 The references seem to be not very tightly spaced, so you have space to add in additional comments if needed. -- [2] In this paper, the authors propose a personalized (latent) feature projection method to model users’ preference over items for one-class recommendation. The idea is first projecting each item’s latent features to the target user’s latent space, and then predicting the preference score based on the user’s personalized projected space. Experiments on four datasets show that the proposed method outperforms state-of-the-art methods, like CofiRank and Bayesian Personalized Ranking. The main strength of the proposed method lies in its generalizability and extendibility. The proposed prediction model (PFP) is simple and intuitive to incorporate the idea of feature projection in latent factor model. Three ranking-based loss functions are demonstrated to optimize the prediction model – AUC-Loss, WARP-Loss and KL Divergence-Loss. The major concern is that the effect of feature projection is questionable. Of the three proposed methods that implement the PFP model with different loss function – PFP-AUC, PFP-WARP and PFP-KL, only PFP-KL shows improved performance over the state-of-the-art methods. The authors should provide more error analysis of PFP-AUC and PFP-WARP methods. In addition, the authors only show the results of PFP-WARP and PFP-KL, while PFP-AUC’s results are omitted (due to its weak performance). However, PFP-AUC’s results are important to show, as it is a direct extension of Bayesian Personalized Ranking (BPR) under the proposed feature projection framework. It is instructive to show its performance and compare with BPR. It is good to show experiments on four datasets; however, the authors should provide more micro-level experiments for each dataset. For example, as the PFP model averages the rated items’ similarity of the target user for prediction, the performance with respect to users with different number of rated items should be presented. Research question to answer here includes how is the model’s performance for sparse users and items. Moreover, in Figure 4, why the performance of PFP-WARP does not change with the number of projected factors? I do not see any reason that PFP-WARP is insensitive to the number of projected factors. Some comments to make the paper more clearly to reader: 1) In section 3.2.3, it is unclear how to “project the parameters back” when the parameters violate the constraints at each update step. 2) In Eq. (21), NDCG is defined on position K (i.e. NDCG@K). But the authors do not report the value of K used in the experiments. 3) No significance tests are reported. The authors should report the significance level when the improvements are slight, such as 0.2501 VS. 0.2449 in Table 3. 4) In Table 3 Performance comparison, the authors should report the detailed parameter settings, i.e., the number of latent factors and projection constraints. 5) In Figure 3 and 4 (line chart), the lines are blocked by the box. The box should be moved so as to not cover the lines. Overall, this paper is well written and easy to follow. The idea is well motivated and the proposed method is simple. However, more micro-level experiments and analysis should be presented to better understand the effectiveness and properties of the proposed method. -- Summary: 1. Simple method 2. Well written 3. Comprehensive experiments and some discussion on parameter effects. -- (The below review is a placeholder; it is from a secondary reviewer - a senior doctoral student in my group) 1. Well-motivated proposal. 2. Simple model, easy to infer, implement and extend. 3. More micro-level experiments and analysis should be presented. ---------- End of Review from Reviewer 4 ---------- ------------- Review from Reviewer 5 ------------- Relevance to SIGIR (1-5, accept threshold=3) : 5 Originality of Work (1-5, accept threshold=3) : 4 Technical Soundness (1-5, accept threshold=3) : 4 Quality of Presentation (1-5, accept threshold=3) : 5 Impact of Ideas or Results (1-5, accept threshold=3) : 4 Adequacy of Citations (1-5, accept threshold=3) : 4 Reproducibility of Methods (1-5, accept threshold=3) : 4 Overall Recommendation (1-6) : 4 -- Comments to the author(s): This is a very instructive and clear paper on training latent factor models for personalized one class recommendation. The proposed models for feature projection evaluate 3 objective functions that are relevant and sound. It is interesting how assumptions on preferences of users are modeled in a max margin training approach. There is an exhaustive evaluation on several recommendation sets. When optimizing the models (e.g., WARP-loss), would it be advantageous to use data or user specific constraints when sampling the negative examples? As any dimensionality reduction model, it is important to select a proper value of K (number of factors). In this sense the discussion of the K-values in section 3.1 is very pertinent. Are there specific characteristics of the data or users that can influence this choice? -- Summary: I like the paper very much: it is written in a clear and didactic way. The assumptions used in the learning models are very pertinent. Could have influence beyond the recommendation field. ---------- End of Review from Reviewer 5 ---------- #################################################################################################### WWW (reject) #################################################################################################### ----------------------- REVIEW 1 --------------------- PAPER: 101 TITLE: Improving Latent Factor Models via Personalized Feature Projection for One Class Recommendation AUTHORS: Tong Zhao, Julian McAuley and Irwin King OVERALL RECOMMENDATION: 1 (Good paper: The paper should be accepted, but I will not champion it) REVIEWER EXPERTISE: 4 (Expert: Expert in this problem area) Originality of work: 3 (Creative: Few people in our community would have put these ideas together) Potential impact of results: 3 (Broad: Could help ongoing research in a broader research community) Quality of execution: 3 (Reasonable: Generally solid work, but certain claims could be justified better) Quality of presentation: 3 (Reasonable: Understandable to a large extent, but parts of the paper need more work) Adequacy of citations: 3 (Reasonable: Coverage of past work is acceptable, but a few papers are missing) ----------- PAPER SUMMARY ----------- This paper proposes a novel method called personalized feature projection (PFP) for the well-known one-class recommendation problem. Technically, the most important and novel part is the preference prediction function as shown in Eq.(4), where a user-dependent preference vector is replaced by a user-dependent preference projection matrix (i.e., P^{u}). With this prediction function, three optimization metrics are adopted to learn the latent parameters, including AUC-loss, WARP-loss, and KL as described in detail in Section 3.2. Empirical studies on five real-world data sets show some improvement of the designed method over some state-of-the-art methods. ----------- REASONS TO ACCEPT ----------- 1) The preference prediction function as shown in Eq.(4) is novel. 2) Embedding the prediction function in three optimization metrics gives a plus. ----------- REASONS TO REJECT ----------- 1) Using a user-dependent preference matrix will increase the time and space complexity. The authors may include some analysis and discussion about the complexity issue. 2) The authors may discuss the relationship between Eq.(4) with that in FISM[8] and SVD++ [KDD 08 by Yehuda Koren]. 3) Detailed parameter search and setting of the baselines and PFP-WARP, PFP-KL as shown in Table 3 should be reported. ----------- COMMENTS FOR AUTHORS ----------- Some comments/suggestions for improvement: 1) [38] is the most similar work as mentioned by the authors in section 1. The authors may include it as one major baseline in the empirical studies. 2) PFP is used in three optimization metrics. The readers expect to see the performance of all those three variants of PFP, i.e., PFP-AUC, PFP-WARP and PFP-KL. However, the empirical studies only include two of them. 3) The organization of the empirical results can be improved, for example, the results in Figure 2 may be moved to around the last paragraph of Section 4. 4) It seems that the statement “All are available on line” in Section 4.1.1 is not true. The page at http://www.public.asu.edu/~jtang20/datasetcode/truststudy.htm only contains Ciao and Epinions. 5) The organization of related work in Section 1 and Section 2.2 can be improved to explicitly show the relationship between the proposed method and existing methods. 6) The AUC results on some data sets seem too low, for example, the BPR and GBPR’s AUC results on Epinions are close to 0.5, which means almost completely random recommendation. ----------------------- REVIEW 2 --------------------- PAPER: 101 TITLE: Improving Latent Factor Models via Personalized Feature Projection for One Class Recommendation AUTHORS: Tong Zhao, Julian McAuley and Irwin King OVERALL RECOMMENDATION: -1 (Weak paper: This paper should be rejected, but I will not fight strongly against it) REVIEWER EXPERTISE: 3 (Knowledgeable: Knowledgeable in this sub-area) Originality of work: 3 (Creative: Few people in our community would have put these ideas together) Potential impact of results: 2 (Limited: Impact limited to improving the state-of-the-art for the problem being tackled) Quality of execution: 3 (Reasonable: Generally solid work, but certain claims could be justified better) Quality of presentation: 4 (Lucid: Very well written in every aspect, a pleasure to read, easy to follow) Adequacy of citations: 3 (Reasonable: Coverage of past work is acceptable, but a few papers are missing) ----------- PAPER SUMMARY ----------- The authors extend the traditional latent factor model by keep around a matrix (which they call as a projection matrix) for each user (instead of a user factor). The motivation is that it lets the model capture user preferences along multiple dimensions. Now the affinity between a user and item can be expressed a vector (multiplying the user projection matrix with the item factor). The authors extend the previously proposed ranking objectives (which compared 2 scalars) to now compare these vectors. They propose 3 interesting ranking based metrics, one based on KL divergence between the vectors. They empirically show the benefits of using the user matrix on 5 real datasets. ----------- REASONS TO ACCEPT ----------- 1. The idea of using a matrix to represent the user factor is nice. However, the most novel aspect here is to develop the ranking metrics to compare the vectors. I especially like the idea of using KL divergence here. 2. The paper is very well written and is a pleasure to read. I especially liked the Section 3 which gradually introduced additional complexities into the model. 3. Experimental analysis is thorough, but they still don't justify whether the improvements can be obtained by simply using more factors. ----------- REASONS TO REJECT ----------- 1. Using a matrix of size 'K x L' (As opposed to 'K x 1') for the user factors essentially means that we are storing 'L' times the number of parameters. This also implies that our training time is going to linearly increase 'L' fold. However, we don't see a significant improvement in the accuracy -- for a linear increase in the training and inference computation times. 2. Could we obtain similar results simply by using more factors for the user and item, in particular, if we use K.L factors for both users and items, would we see the same improvements in accuracy ? In table 3, please include the number of factors that gave the best number for each of the algorithms. Without this information, it is hard to rule out that the benefits are simply because of using a bigger factor. 3. Please also compare against Jason Weston et al.[38]'s max-match approach. Since they use the max, they obtain a non-linearity property which actually provides more benefits than using a bigger factor. 4. The PFP methods outperform the others in terms of AUC, but they are actually worse off, in terms of nDCG/precision metrics. Why do you think that is ? Models with higher Precision/Recall metrics are much more suitable for use in a real world recommender system. ----------- COMMENTS FOR AUTHORS ----------- 1. KL divergence is asymmetric. Does this cause any problem ? Why not use any of the symmetric versions of the KL divergence ? 2. Why are the numbers missing for the PFP AUC loss, is it worse than the BPR strategy ? It would be interesting to compare the 2 methods, especially since PFP can be thought of as a direct generalization with more parameters. 3. How would you suggest that the parameter K* be chosen ? Would this be based on the number of positive feedbacks per user ? 4. Why is the WARP number worse off than BPR, since we can think of WARP as being strictly more generic than the BPR algorithm. 5. Section 2.2: bayesian -> Bayesian ----------------------- REVIEW 3 --------------------- PAPER: 101 TITLE: Improving Latent Factor Models via Personalized Feature Projection for One Class Recommendation AUTHORS: Tong Zhao, Julian McAuley and Irwin King OVERALL RECOMMENDATION: -1 (Weak paper: This paper should be rejected, but I will not fight strongly against it) REVIEWER EXPERTISE: 2 (Some familiarity: Generally aware of the area) Originality of work: 2 (Conventional: Rather straightforward, a number of people could have come up with this) Potential impact of results: 2 (Limited: Impact limited to improving the state-of-the-art for the problem being tackled) Quality of execution: 2 (Poor: Potentially reasonable approach, but some claims lack justification) Quality of presentation: 2 (Sub-standard: Requires a heavy rewrite to make the paper more readable) Adequacy of citations: 2 (Inadequate: Literature review misses many important past papers) ----------- PAPER SUMMARY ----------- A personalized feature projection method to model users’ preferences over items. define a personalized projection matrix, which takes the place of user-specific factors from existing models. ----------- REASONS TO ACCEPT ----------- - Latent Factor models, which transform both users and items into the same latent feature space - define both users’ and items’ latent factors to be of the same size and use an inner product to represent a user’s ‘compatibility’ with an item. - Intuitively, users’ factors encode ‘preferences’ while item factors encode ‘properties’, so that the inner product encodes how well an item matches a user’s preferences. - each dimension of each user’s opinion may depend on a combination of multiple item factors simultaneously. - view each dimension of a user’s preference as a personalized projection of an item’s properties so that the preference model can capture complex relationships between items’ properties and users’ preferences. ----------- REASONS TO REJECT ----------- - better explanation of mathematic formulas ----------- COMMENTS FOR AUTHORS ----------- what are the perspectives ? ----------------------- REVIEW 4 --------------------- PAPER: 101 TITLE: Improving Latent Factor Models via Personalized Feature Projection for One Class Recommendation AUTHORS: Tong Zhao, Julian McAuley and Irwin King OVERALL RECOMMENDATION: 0 (OK paper: I hope we can find better papers to accept) REVIEWER EXPERTISE: 4 (Expert: Expert in this problem area) Originality of work: 3 (Creative: Few people in our community would have put these ideas together) Potential impact of results: 3 (Broad: Could help ongoing research in a broader research community) Quality of execution: 3 (Reasonable: Generally solid work, but certain claims could be justified better) Quality of presentation: 3 (Reasonable: Understandable to a large extent, but parts of the paper need more work) Adequacy of citations: 3 (Reasonable: Coverage of past work is acceptable, but a few papers are missing) ----------- PAPER SUMMARY ----------- Meta review "This paper proposes a novel method called personalized feature projection (PFP) for the well-known one-class recommendation problem. Technically, the most important and novel part is the preference prediction function as shown in Eq.(4), where a user-dependent preference vector is replaced by a user-dependent preference projection matrix (i.e., P^{u}). With this prediction function, three optimization metrics are adopted to learn the latent parameters, including AUC-loss, WARP-loss, and KL" ----------- REASONS TO ACCEPT ----------- Meta review - "1) The preference prediction function as shown in Eq.(4) is novel." "The idea of using a matrix to represent the user factor is nice. However, the most novel aspect here is to develop the ranking metrics to compare the vectors." "each dimension of each user’s opinion may depend on a combination of multiple item factors simultaneously. " - "2) Embedding the prediction function in three optimization metrics gives a plus." - "The paper is very well written and is a pleasure to read. " ----------- REASONS TO REJECT ----------- Meta review - should discuss cost/benefit: "1) Using a user-dependent preference matrix will increase the time and space complexity. The authors may include some analysis and discussion about the complexity issue." "Using a matrix of size 'K x L' (As opposed to 'K x 1') for the user factors essentially means that we are storing 'L' times the number of parameters. This also implies that our training time is going to linearly increase 'L' fold." - "Could we obtain similar results simply by using more factors for the user and item, in particular, if we use K.L factors for both users and items, would we see the same improvements in accuracy ?" ----------- COMMENTS FOR AUTHORS ----------- Meta review I generally liked this paper, and think the ideas are relatively novel. The weakness is the lack of analysis on cost/benefit in terms of added complexity (is it worth it?) and whether simply using a larger order (# of latent dimensions) might have given the same results. These are not deadly weaknesses, but are somewhat important to gain a better understanding of the proposed method. ------------------------- METAREVIEW ------------------------ There is no metareview for this paper