============================================================================ REVIEWER #1 ============================================================================ What is this paper about, what contributions does it make, and what are the main strengths and weaknesses? --------------------------------------------------------------------------- This paper proposes a two-stage framework using global knowledge to improve the clarification question generation by identifying the missing information and asking useful questions. Specifically, the missing information is detected by comparing the difference between the local context information and global information within the same category. Then the missing information and question pairs are fed into the fine-tuned BART to generate useful questions. Many different baselines are compared within the automatic and human evaluations, which show the state-of-the-art performance of the proposed model. Strength: 1. The missing information is detected by comparing the difference between the local context and global knowledge. 2. A usefulness criteria is added into the generation model to improve the usefulness of the generated questions. Weakness: 1. It will be better if there are more analyses about the insights of the generated results and model 2. The operation that constructing the global knowledge from train, validation and test dataset might make the model trained in the validation and test datasets, which might lead to an unfair comparison with other baselines. --------------------------------------------------------------------------- Reasons to accept --------------------------------------------------------------------------- The proposed model is novel and it achieves the best performance compared to several baselines on different datasets. --------------------------------------------------------------------------- Reasons to reject --------------------------------------------------------------------------- See in the weakness. --------------------------------------------------------------------------- Questions for the Author(s) --------------------------------------------------------------------------- Questions: 1. In line 104, it mentions that all similar contexts in your data including train, validation and test are grouped together to construct the global knowledge. I am concerned about whether this operation will make the model trained in the validation and test dataset in some ways? 2. In Fig1, the global schema is collected from the class “Laptop Accessories”, I think this class might be too general and collected features might be irrelevant to some specific product descriptions, e.g., some features of USB C cable might be noise to the laptop briefcase. Is this issue frequent in the Amazon dataset? How do you deal with this issue? --------------------------------------------------------------------------- --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Overall Recommendation: 3.5 ============================================================================ REVIEWER #2 ============================================================================ What is this paper about, what contributions does it make, and what are the main strengths and weaknesses? --------------------------------------------------------------------------- This paper aims to identify the missing information given contexts and generate clarification questions to reduce the ambiguity. A two-stage framework is proposed to solve this problem, where the difference of schema between global knowledge (global schema) and context information (local schema) is obtained in the first stage, and feed the missing schema to a BART-based encode-decoder model, augmented with PPLM decoder. Exhaustive experiments are conducted to evaluate the usefulness of proposed architecture and analyze each component. --------------------------------------------------------------------------- Reasons to accept --------------------------------------------------------------------------- 1.The paper considers an important problem: asking clarification questions given context in community QA. And it is the first work to explore clarification question generation in goal-oriented dialog scenario. 2.This paper proposed a two-stage architecture with many components, the authors conducted exhaustive experiments to analyze the usefulness of each component. The results demonstrated the effectiveness of the proposed method. 3. This paper is well presented. Detailed information about the model is clearly introduced. --------------------------------------------------------------------------- Reasons to reject --------------------------------------------------------------------------- 1.The definition of usefulness is not well-formed. According to the appendix, the usefulness is defined for a question. But the classifier is trained with schema elements of a question. From my understanding, if a question is useful, it indicates the combination of its schema is useful, but it does not imply each element of its schema is useful. --------------------------------------------------------------------------- Questions for the Author(s) --------------------------------------------------------------------------- 1.According to Table 7, it seems that the model tends to generate longer sentences after using PPLM. Did you investigate that PPLM-equipped deoder tends to generate longer sentences? 2.For each example, there could be multiple perspectives of missing information. Did you conduct an experiment to analyze the coverage of the generated question over missing schema? --------------------------------------------------------------------------- --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Overall Recommendation: 3.5 ============================================================================ REVIEWER #3 ============================================================================ What is this paper about, what contributions does it make, and what are the main strengths and weaknesses? --------------------------------------------------------------------------- The paper proposes to solve the problem of clarification question generation, which requires to identify useful missing information in a given context. The authors present a framework that is divided into two parts: 1. Using the global knowledge to find the missing information and 2. Using the large pre-trained language model to generate a useful question. Overall, the comparison between the global schema and local schema is interesting, and the proposed method is intuitive and reasonable, the expressive experimental results back up the statements. --------------------------------------------------------------------------- Reasons to accept --------------------------------------------------------------------------- 1. A reasonable clarification question generation system, which obtains the SOTA results. 2. The usage of global knowledge is interesting. 3. Well-written. 4. Code is available. --------------------------------------------------------------------------- Reasons to reject --------------------------------------------------------------------------- 1. It seems the global schema is built from the given dataset, but in the real application the global data may be dynamic. Is the proposed method flexible enough to deal with this? 2. For the human evaluation, the human-annotated questions score similar or even worse than the baselines. Does this indicate the baseline models are enough for generating good questions? Or the human evaluation is not well set? --------------------------------------------------------------------------- --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Overall Recommendation: 3.5