============================================================================ ACL-IJCNLP 2021 Reviews for Submission #3466 ============================================================================ Title: Unsupervised Enrichment of Persona-grounded Dialog with Background Stories Authors: Bodhisattwa Prasad Majumder, Taylor Berg-Kirkpatrick, Julian McAuley and Harsh Jhamtani ============================================================================ META-REVIEW ============================================================================ Comments: This paper presents a very elegantly designed and experimented approach to mitigate the potential dullness and inconsistency issues of open domain conversational systems. The paper puts together several already proposed methods (e.g., PPLM, backward pass with soft constraints, retrieval, etc.) together in an elegant way, and the reviewers have also found that the paper is well written, easily reproducible and the community on dialog modeling would see great merit from this paper. ============================================================================ REVIEWER #1 ============================================================================ The core review --------------------------------------------------------------------------- The authors propose a system that enriches persona-based dialog response generation with background stories extracted from a story corpus. They use a pre-trained persona chat model which remains unchanged, and employ gradient-based inference to change the original model output. For a given dialog history input from PersonaChat, a persona is sampled which is used to retrieve relevant stories from the ROCStories corpus. The sampled story is then used as a constraint to change the original model output. In a human evaluation, models with the proposed inference method clearly outperform competitive baselines. --------------------------------------------------------------------------- Reasons to Accept --------------------------------------------------------------------------- The paper is well-written, and should be easily reproducible with the submitted material. The proposed method shows clear improvement over non-enriched persona-grounded response generators. --------------------------------------------------------------------------- Reasons to Reject --------------------------------------------------------------------------- Human evaluation potentially too weak, with only 2 annotators per test sample, though results seem clear. Somewhat unclear presentation of results. --------------------------------------------------------------------------- --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Overall Recommendation: 4 Questions for the Author(s) --------------------------------------------------------------------------- You assert that "further human evaluation reveals that responses generated with moderate \lambda_d (= 1) are more engaging and sensible as compared to a very high (= 5) or very low (= 0.5) values." , a claim that is repeated in the appendix. However, neither in the paper nor the appendix do you show any scores or human evaluation win/loss rate for the base model (lambda_d=1) compared to \lambda_d at 0.5 or 5. Why not? In Table 1 , it is a bit strange to highlight the PABST-PSEUDO column. First of all, those are not the best numbers in that Table, as those come (unsurprisingly) from DiscChoice-Persona. Second, this is not even the intended target system, if I understand it right. PABST-PERSONA seems to be your actual proposed model, which is used in the human evaluation and pitted against the other systems (and wins quite convincingly...) In the output examples, does PSEUDO refer to DiscChoice-Pseudo or PABST-Pseudo? What are the personas and sampled stories in those outputs. I think at least in the appendix, this could be shown, the paper probably doesn't have the space. For human evaluation, were samples randomized, or was PABST always the first system, and the other models always second (or the other way around)? --------------------------------------------------------------------------- ============================================================================ REVIEWER #2 ============================================================================ The core review --------------------------------------------------------------------------- The paper proposed to generate background-augmented dialog responses by incorporating story texts from external corpora. Noticing that real conversations are usually supported by rich background stories, while existing dialog systems (even including persona-aware systems) generate responses with limited contents, the authors proposed to retrieve proper background stories and use them for generating content-rich responses. STRENGTHS: * Constructing/identifying supporting facts for response generation is very meaningful. I believe the community will benefit a lot from working on this direction. * Concrete and effective methods have been proposed for both how to retrieve related background stories and how to adapt them for generating fluent responses. * Sufficient experiments and details. WEAKNESSES: * I don't see any obvious weakness. But if I need to give a reason for not giving the highest recommendation score, it is because the proposed method is more of a combination and improved version of existing techniques, so that the novelty is a bit limited. CONCLUSION: This is a compact short paper that discovered an important question, proposed an effective solution, and presented convincing results. While there still remains many interesting topics to explore, for example, 1) whether the same technique can be applied to other corpora with plain conversations only and 2) whether it could be used to construct training data for building fact-aware open-domain response generation systems, the currently presented work is pretty good for a short paper in my opinion. --------------------------------------------------------------------------- Reasons to Accept --------------------------------------------------------------------------- * Attempt to solve a meaningful problem in dialog response generation. * Effective solutions have been proposed and confirmed by experimental evaluation. --------------------------------------------------------------------------- Reasons to Reject --------------------------------------------------------------------------- None. --------------------------------------------------------------------------- --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Overall Recommendation: 4 ============================================================================ REVIEWER #3 ============================================================================ The core review --------------------------------------------------------------------------- This paper proposes a novel approach to enrich dialog agent personas with relevant backstories, relying only on existing story datasets. Particularly, the authors propose an unsupervised back-propagation based decoding procedure to adapt the relevant stories such that the resulting response is fluent with the dialog history and consistent with the dialog agent persona. Experiments demonstrate the effectiveness of the proposed approach. --------------------------------------------------------------------------- Reasons to Accept --------------------------------------------------------------------------- This paper is well-written in general. The idea of introducing backward pass with soft constraints seems to be new. Experiments are extensive to validate the new methods. --------------------------------------------------------------------------- Reasons to Reject --------------------------------------------------------------------------- No obvious weaknesses. --------------------------------------------------------------------------- --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Overall Recommendation: 4