============================================================================ ACL 2023 Reviews for Submission #2373 ============================================================================ Title: KNOW How to Make Up Your Mind! Adversarially Detecting and Remedying Inconsistencies in Natural Language Explanations Authors: Myeongjun Jang, Bodhisattwa Prasad Majumder, Julian McAuley, Thomas Lukasiewicz and Oana-Maria Camburu ============================================================================ META-REVIEW ============================================================================ Comments: Reviewers agreed that the paper is well-written and thorough, and that the method is sound in addressing NLE consistency via grounding in real-world knowledge. R2 recommended softening of certain claims ("remedy" to "filter"), while R3 had concerns about data leakage between prediction task and knowledge base. Both concerns were addressed satisfactorily in the author response. With a consensus on both soundness and excitement, I recommend this paper for acceptance to the main conference. ============================================================================ REVIEWER #1 ============================================================================ What is this paper about and what contributions does it make? --------------------------------------------------------------------------- NLE models have demonstrated considerable promise in a variety of applications, researchers and practitioners have noted a number of issues. Interpretability, Bias, Generalization, Scalability, User comprehension, and inconsistency between NLEs are some of the main issues with NLE models. This paper evaluates them by providing an adversarial attack method to check their robustness in providing consistent results. It also discusses a method to reduce inconsistency between various NLEs produced by these models in a single context. Finding inconsistencies in natural language explanations (NLEs) generated by NLE models is an important part of assessing their quality and reliability. Inconsistencies in NLEs can indicate a flaw in the model's reasoning or ability to accurately explain its output to the user, so they must be addressed correctly. Detecting inconsistencies in NLEs is also important in applications where the model's output has a significant impact on the user or society as a whole. In applications such as healthcare, finance, or legal decision-making, for example, it is critical that the NLE models provide users with accurate and consistent explanations. Identifying inconsistencies in NLEs, on the other hand, can assist developers in improving the model's performance and reliability. By analyzing the types of inconsistencies that occur and the reasons for them, developers can improve the model's architecture, training data, or explanation generation algorithms. --------------------------------------------------------------------------- Reasons to accept --------------------------------------------------------------------------- About paper structure: This paper is written in clear and concise language, with a logical and organized structure that is simple to follow. This paper also includes well-designed tables that are simple to read and understand, as well as additional information. It has correctly referred to relevant and current references. It clearly presented the research results and the conclusions drawn from them; additionally, the paper is transparent about any limitations or uncertainties in the findings. About technical advantages: This paper has contributed to the development of a model for evaluating NLE consistency, as well as methods for correcting these inconsistencies, which can have a number of benefits for the NLP community, including: Improving the overall accuracy and reliability of natural language processing systems. Consistency is essential in NLP for accurate and dependable results. As a result of employing this ethos, NLP systems can become more accurate and reliable, which can benefit a variety of applications such as machine translation, sentiment analysis, and speech recognition. Increasing user trust: Users are more likely to trust and rely on NLP systems when they consistently produce accurate results. This could lead to a greater use of NLP technology in a variety of domains, ranging from customer service chatbots to legal document analysis. Contributing to the development of evaluation models and addressing methods for resolving inconsistencies can help advance the state-of-the-art in NLP. This can lead to new research questions and potential field breakthroughs. --------------------------------------------------------------------------- Reasons to reject --------------------------------------------------------------------------- None --------------------------------------------------------------------------- Questions for the Author(s) --------------------------------------------------------------------------- You mentioned in the introduction section that your adversarial attack model is fast. As a result, it would be beneficial to include a comparison of your model to the eIA model. You stated in the results section, "We observe that eIA generates a tremendous amount of inconsistent candidates (Ie), e.g., 24M for e-SNLI," but you did not mention your observation about your proposed model; please include this for consistency. Do you have any examples for the statement "better NLE quality may not necessarily guarantee fewer inconsistencies" in the same section? Is it just noise and errors in the data, a lack of diversity in the training data, and a variation in language use, or are there other reasons? It is preferable to include related works in the preceding sections. How many of the remaining inconsistencies are from the same set of inconsistent NLEs before applying the Know model to them (in other words I want to know how much injecting limited knowledge to the model biased it toward new sets of inconsistency). Or is it possible that we are jeopardizing the reasoning correctness for the sake of consistency, and we will end up with two consistent NLEs that are not aligned with the ML model's inference? --------------------------------------------------------------------------- --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Soundness: 4 Excitement (Short Paper): 3.5 Reviewer Confidence: 3 Recommendation for Best Paper Award: No Reproducibility: 4 Ethical Concerns: No ============================================================================ REVIEWER #2 ============================================================================ What is this paper about and what contributions does it make? --------------------------------------------------------------------------- The paper proposes an approach for generating adversarial attacks to Natural Language Explanation (NLE) models, which improves upon the existing explanation Inconsistency Attack (eIA) framework by replacing simple lexical negation rules with new ones that are conditional on part-of-speech and antonym pairs obtained from a knowledge base (e.g., ConceptNet), in order to generate the inconsistent (adversarial) explanations. It also proposes using the knowledge base (e.g., ConceptNet) for generating relation statements regarding the entities in a premise and injecting said statements as context to the premise itself, as a remedy for the NLE models' inconsistencies. The core contribution of the paper consists in leveraging human curated knowledge bases to produce the adversarial explanations more correctly, consistently and efficiently, and also to direct NLE generation behaviour. A secondary contribution is the quantitative analysis w.r.t. SOTA NLE models consistency, which highlights important problems with NLE models. --------------------------------------------------------------------------- Reasons to accept --------------------------------------------------------------------------- The work tackles an important issue regarding NLE inconsistency: the lack of semantic grounding information, and consequent loss of confidence in the generated explanations. The proposed approach leverages such information in order to better measure model consistency and possibly improve it. Despite it's limitations, the results indicate this approach is a solid improvement over the previous cited adversarial methods. The text is well written overall. --------------------------------------------------------------------------- Reasons to reject --------------------------------------------------------------------------- While the proposed adversarial approach is a well grounded contribution and brings relevant information regarding NLE model behaviour, the proposed "remedy" does not tackle the fundamental issue with NLE consistency evidenced in the experiment results: the semantic grounding information is not modeled, but rather "hinted" to the model, so it can guide syntactic pattern generation. The NLE generation process is thus still inconsistent in essence and the confidence problem remains. Framing the solution as a "filter" or "guide" to improve explanation quality, instead of a "remedy" would be a better claim. Additionally the difference in time between eIA and eKnowIA is not clarified in the text (see questions for the authors). --------------------------------------------------------------------------- Questions for the Author(s) --------------------------------------------------------------------------- Question A: What is the main factor causing the difference in time between eIA and eKnowIA? Is it a computational issue or manual filtering of the examples? This should be clarified in the text. --------------------------------------------------------------------------- Typos, Grammar, Style, and Presentation Improvements --------------------------------------------------------------------------- L206: prohibiting -> prohibitive --------------------------------------------------------------------------- --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Soundness: 2 Excitement (Short Paper): 3.5 Reviewer Confidence: 4 Recommendation for Best Paper Award: No Reproducibility: 4 Ethical Concerns: No ============================================================================ REVIEWER #3 ============================================================================ What is this paper about and what contributions does it make? --------------------------------------------------------------------------- The paper presents a new adversarial attack for NLI and Commensense Resaon tasks. The adversarial attacks utilize external knowledge to produce adversarial examples involving logically inconsistent explanations. In particular, possible antonym replacement and unrelated noun replacement are mined from ConceptNet. The experiments with multiple models on E-SNLI and CoS-E suggest the effectiveness of the proposed attack. In addition, the paper proposes a simple-yet-effective remedy: extracting related commonsense knowledge, encoding them using rules, and adding the knowledge to the inputs. --------------------------------------------------------------------------- Reasons to accept --------------------------------------------------------------------------- The evaluation is thorough, covering multiple models and two datasets. The authors also carefully consider the quality of explanations and evaluate the quality with human evaluation. The paper is well written and easy to follow. --------------------------------------------------------------------------- Reasons to reject --------------------------------------------------------------------------- I am a bit concerned about whether the proposed remedy could generalize. The proposed remedy uses the same knowledge that is used for creating the attack. There are some ''leakage'' issues here. Also, it might be good to provide more analysis on the language of the attack, e.g. fluency, grammaticality, to assess the quality of the language. --------------------------------------------------------------------------- Questions for the Author(s) --------------------------------------------------------------------------- Question A: For results in Table 2. do you need to retrain the models if you inject the knowledge (which alters the distribution of the inputs)? --------------------------------------------------------------------------- --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Soundness: 4 Excitement (Short Paper): 3.5 Reviewer Confidence: 3 Recommendation for Best Paper Award: No Reproducibility: 4 Ethical Concerns: No