============================================================================ META-REVIEW ============================================================================ Comments: Reviewers are overall happy with the paper. Main concerns seem minor and have to do with the evaluation and include more analysis (better error analysis, ablation studies), which the authors are ready to address in the extra page, and on the other hand it's kind of understandable for short papers. ============================================================================ REVIEWER #1 ============================================================================ What is this paper about, what contributions does it make, and what are the main strengths and weaknesses? --------------------------------------------------------------------------- This paper is about automatic editing of recipes, based on ingredients that the users want to add or remove from the recipe, introducing an unsupervised approach. The main strengths of the paper is that it presents an algorithm that seems to outperform previous approaches in the field, regarding "serendipity", coherence, correctness and relevance. The authors also clearly explain how their method differs from previous work. The main weakness is in the evaluation. I would have liked to see more of an error analysis of the results, showing in what contexts the algorithm fails, the reasons for this, and how this could possibly be improved. (And reversely, in what contexts the algorithm works really well.) Related to this, I think it would be good to add manual evaluation by experts, since mechanical turk is not always very reliable. --------------------------------------------------------------------------- Computationally-aided linguistic analysis NoNLP engineering experiment paper YesReproduction paper NoPosition paper NoResource paper No Reasons to accept --------------------------------------------------------------------------- An interesting topic that has not been much heard of, and for which the authors present a seemingly successful method. --------------------------------------------------------------------------- Reasons to reject --------------------------------------------------------------------------- The paper presents kind of a niched subject, meaning that the results might not be directly useful for many other researchers. --------------------------------------------------------------------------- Typos, Grammar, Style, and Presentation Improvements --------------------------------------------------------------------------- I wonder whether 'serendipitous'/'serendipity is really the correct term to use? I understand that it has to do with positive surprises in your evaluation, but when you write that humans deem the recipes "more serendipitous", it may sound to the reader as if the recipe is random in some way. Under Limitations: for long document --> for long documents --------------------------------------------------------------------------- --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Overall Recommendation - Short Paper: 4 ============================================================================ REVIEWER #2 ============================================================================ What is this paper about, what contributions does it make, and what are the main strengths and weaknesses? --------------------------------------------------------------------------- This paper describes an unsupervised technique for recipe editing. Given a base template recipe, the user can alter one of the ingredients using a binary string and generate a coherent cooking instruction with the altered ingredient list. The main objectives are ingredient prediction (multicategory binary classification) and recipe completion (masked language objective). The novel technique used is latent space editing whereby the latent space z is modified to predict the altered ingredient list. During inference, a corrupted base template is given to the system with the new ingredient and corresponding instruction removed. In the latent space editing stage, z is iteratively fine-tuned towards the target ingredient list for the prediction objective. The baselines are general language models (GPT, BART, PLM) trained on recipe datasets. Strengths 1/ The paper is well-motivated and outlines clear objectives. 2/ The addition of a latent space editing stage is reasonable. 3/ This work is in the area of controlled generation of text. An interesting application in recipe editing. I could see a lot of potentials. Weaknesses 1/ There is one main concern with the baselines used. It would be better to conduct an ablation study on the RecipeCrit. One with masked inputs (both ingredient and instructions) followed by iterative critiquing and another one with only masked instructions. Then comparing those two generated results with the base recipe will shed light on the effect of the iterative critiquing technique. The current comparison with a fine-tuned model could be due to their prior bias. Having an ablation with the same training data is necessary here. Although BART and PLM share the same masked language objective, they cannot be considered an ablation study. --------------------------------------------------------------------------- Computationally-aided linguistic analysis NoNLP engineering experiment paper YesReproduction paper NoPosition paper NoResource paper No Reasons to accept --------------------------------------------------------------------------- The community could benefit from this interesting application, but the baseline issue needs to be addressed. --------------------------------------------------------------------------- Questions for the Author(s) --------------------------------------------------------------------------- In the examples (table 3) I observe a significant variation in the cooking instructions RecipeCrit vs base recipe. What do you propose to limit this? Have you tried to edit multiple ingredients at the same time? Or do you intend to frame multiple editing as multiple stages of single edits? --------------------------------------------------------------------------- --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Overall Recommendation - Short Paper: 3.5 ============================================================================ REVIEWER #3 ============================================================================ What is this paper about, what contributions does it make, and what are the main strengths and weaknesses? --------------------------------------------------------------------------- The paper presents an innovative recipe editing system for ingredient substitution in recipes, using a generative language model to generate a new recipe that includes the new ingredients or remove existent ingredients based on the input recipe instead of just substituting ingredients, generating more coherent and relevant recipes. The system build a representations of the title, ingredients and instructions to get a representation of the recipe using transformer encoders. Then predicts if the proposed ingredients were or not in the original recipe treating the problem as a multi label binary classification, and finally generates the new recipe using a transformer decoder. The authors use a unsupervised critiquing method to better interpret the user's feedback and the predicted ingredients, in order to generate a more feasible recipe. The system is then compared in the Recipe1M dataset against general and recipe-specific large language models. It includes quantitative and human evaluation, together with a case study for better explanation. The proposed system performs better than the baselines in all the proposed metrics. --------------------------------------------------------------------------- Computationally-aided linguistic analysis NoNLP engineering experiment paper YesReproduction paper NoPosition paper NoResource paper No Reasons to accept --------------------------------------------------------------------------- - Very good writing and clear explanations. - The proposed method allows for the use of non-annotated data and user's feedbak, allowing for more realistic recipes. - Very complete experiments. The authors includes human and quantitative evaluation of results, along with a case study to see the differences between this and other SoTA models. --------------------------------------------------------------------------- --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Overall Recommendation - Short Paper: 4