============================================================================ EMNLP 2021 Reviews for Submission #3744 ============================================================================ Title: Detect and Perturb: Neutral Rewriting of Biased and Sensitive Text via Gradient-based Decoding Authors: Zexue He, Bodhisattwa Prasad Majumder and Julian McAuley ============================================================================ META-REVIEW ============================================================================ Comments: This paper introduces a framework for removing bias from text by rewriting sentences with sensitive attributes into more neutral forms. One key advantage of the proposed method is that no parallel corpus is required for model training. The evaluation result is strong in comparison to the baselines. However, depending on annotated data from which to identify sensitive words may limit the practicality of the proposed approach. Ethics Metareview: This paper was reviewed by the ethics committee. The review(s) are included below. ============================================================================ REVIEWER #1 ============================================================================ What is this paper about, what contributions does it make, and what are the main strengths and weaknesses? --------------------------------------------------------------------------- This paper propose to neutralize the text by detecting the sensitive phrases and perturb them using a decoder with a neutralizing constraint. The target of the task is interesting and could be of potentially benefit to our society. However, this work is not ready for publication in current form. Weakness: - It is hard to understanding all the contents in this paper. (1) The main method for perturbing is not clear enough. See questions for authors. (2) See more comments for presentation improvements. - The target of the model could be misleading. See L152, there is no guarantee of semantic relevance to the original text. This means that the model only chooses ‘bad’ words for attribute but not considers fluency or semantic relevance. That is probably why BLEU4 of PEN is much worse than other models. - Baselines: some strong baselines in neutralizing text [1,2] and text style transfer[3] are missing. - To rely on which evaluation metrics is open to the readers. How to select the ‘best’ model? How these automatic methods align with inspection of human? - The novelty of this paper is not significant. (1) The mask+rewrite was discussed in [3] and other papers. (2) The masking approach is based on Jain et al. (2020). (3) I still have concerns on rewriting model. --------------------------------------------------------------------------- Reasons to accept --------------------------------------------------------------------------- The topic is interesting and beneficial. The detect and rewrite framework is rational. --------------------------------------------------------------------------- Reasons to reject --------------------------------------------------------------------------- Same as weaknesses. --------------------------------------------------------------------------- Questions for the Author(s) --------------------------------------------------------------------------- What is U(C)? Is it P(y=a)? If so, 1/|C| could be already biased? For example, a class may be dominant in a corpus. L057: How to define minimal edits? --------------------------------------------------------------------------- Missing References --------------------------------------------------------------------------- [1] Privacy-aware text rewriting (INLG19) [2] Automatically Neutralizing Subjective Bias in Text (AAAI20) [3] Transforming Delete, Retrieve, Generate Approach for Controlled Text Style Transfer (EMNLP19) --------------------------------------------------------------------------- Typos, Grammar, Style, and Presentation Improvements --------------------------------------------------------------------------- L006 and L008: ‘at best’ and ‘at worst’ -> In the best case? L035: ‘outputs’ Do you mean ‘decisions’? L051: isn’t available -> is unavailable L055: regenerate -> regenerating In most cases, rewrite is better than regenerate. L266: ‘harder’ than what? --------------------------------------------------------------------------- --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Reproducibility: 3 Ethical Concerns: No Overall Recommendation - Short Paper: 2.5 ============================================================================ REVIEWER #2 ============================================================================ What is this paper about, what contributions does it make, and what are the main strengths and weaknesses? --------------------------------------------------------------------------- This paper introduces a framework to rewrite sentences into more neutral form, in the sense that target sensitive attributes (e.g. gender) are less recognizable from the text. The framework consists of 3 parts: 1) a classification model trained towards sensitive attribute to identify salient part by attention score 2) detected texts are masked out and another s2s model is trained to recover texts 3) a gradient-based decoding method, adapted from previous Plug&Play framework [1], is then added to reward generation of less sensitive attribute. The author proposed a neutralizing constraint which rewards model generations, when the predicted sensitive attribute distribution from the generated text is closer to uniform distribution. Strengths: S1. The framework is explained clearly and intuitive. S2. Detailed evaluation results are provided for both the reduction of bias, fluency and coherence, and are showing good results against baseline methods of weighted decoding and adversarial training. Weaknesses: W1. In comparison to the previous plug&play [1] framework, the main modification is on the proposed neutralizing constraints. Whereas the main topic in [1] is mostly controlled generation that promotes attributes, related task such as language detoxification has already been included. W2. Previous works such as [2] in this field are not discussed. [1] Plug and play language models: A simple approach to controlled text generation, ICLR, 2021 [2] Privacy-Aware Text Rewriting, INLG, 2019 --------------------------------------------------------------------------- Reasons to accept --------------------------------------------------------------------------- A1. The proposed framework doesn't require parallel corpus for training and is considered more suitable for this task of removing bias information from texts. A2. Detailed experiment results are provided and it can serve for comparison for future research in the less-explored debias rewriting field. --------------------------------------------------------------------------- Reasons to reject --------------------------------------------------------------------------- R1. The contribution in addtion to the Plug2Play framework is limited. R2. The classification accuracy may not be sufficient as a score for debias result by its own, as it doesn't well separate the cases, if all sentences are made towards a neutral form, or if they are pulled even further to the opposite side. --------------------------------------------------------------------------- Questions for the Author(s) --------------------------------------------------------------------------- The $k$ for top-k words to be masked are said to a hyper-parameter. How is its value affecting the performance? Is the model always masking top-k words for all or only for biased sentences? --------------------------------------------------------------------------- Missing References --------------------------------------------------------------------------- [1] Xu, Q., Qu, L., Xu, C., & Cui, R. (2019). Privacy-aware text rewriting. In Proceedings of the 12th International Conference on Natural Language Generation (pp. 247-257). --------------------------------------------------------------------------- Typos, Grammar, Style, and Presentation Improvements --------------------------------------------------------------------------- 1. LN217 generates 2. LN152, although easy to refer, it may be better to indicate U as the uniform dist. 3. Wrong reference: "Style transfer through back-translation", is accepted to ACL --------------------------------------------------------------------------- --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Reproducibility: 5 Ethical Concerns: No Overall Recommendation - Short Paper: 3.5 ============================================================================ REVIEWER #3 ============================================================================ What is this paper about, what contributions does it make, and what are the main strengths and weaknesses? --------------------------------------------------------------------------- This paper focuses on the task of removing biases from text. Biases in this context means "specific words and/or phrases that reveal 'sensitive' information about the author". Some of these biases may be explicit (such as "she is hard-working") and some may be implicit (such as "basketball" being a term used predominantly by men). This paper presents a method called "DePen" that "neutralizes" texts. The paper is well written and clearly explained given the space constraints. The main strengths of this paper are: * A clear and straightforward method * Good results in comparison to strong baselines The main weaknesses of this paper are: * Depends on training data that may not be available * Not a novel task --------------------------------------------------------------------------- Reasons to accept --------------------------------------------------------------------------- The method presented in this paper is well done, it is clearly explained, and it has a firm grasp of the current theory regarding how Seq2Seq models and/or Transformers achieve high performance in multiple tasks - instead of assuming BART as a black box, the approach depends on modifying the behavior of pre-trained transformers to achieve an objective for which they were not originally designed. Should this paper be accepted, future researchers can learn a lot about how to tweak transformers for their own needs. The results presented in Table 2 seem solid, and the authors present strong evidence that their model performs as well in their task as one could expect. --------------------------------------------------------------------------- Reasons to reject --------------------------------------------------------------------------- Unlike other approaches where one identifies biases (for instance) in the embeddings itself by detecting those embeddings that have an unusually high value w.r.t. a "bias" measure, this approach depends on annotated data from which to identify sensitive words. This means that the method is only useful if one has access to a corpora of sensitive information. Paradoxically, such a corpora would then be a valuable resource for building a system capable of identifying sensitive attributed based on clear text, as the authors have done here when following (Jain et. al. 2020). --------------------------------------------------------------------------- Missing References --------------------------------------------------------------------------- The authors dedicate about 1/4 of a page to Related Work, which is fine given the constraints. That said, I encourage the authors to add more references to other approaches in bias removal. While I do not explicitly suggest to add these papers, these two references came immediately to mind when reading the paper: * Learning to Flip the Bias of News Headlines - https://aclanthology.org/W18-6509.pdf * Debiasing Pre-trained Contextualised Embeddings - https://aclanthology.org/2021.eacl-main.107.pdf Should their paper be accepted, the authors may wish to add some more background information on Related Work. --------------------------------------------------------------------------- Typos, Grammar, Style, and Presentation Improvements --------------------------------------------------------------------------- Should the paper be accepted, I encourage the authors to include a "Conclusion" section. I have not removed points for this, but the paper's abrupt ending is a bit disconcerting. I also suggest that the autors use "Seq2Seq" or even "S2S" instead of the current "s2s" abbreviation for "Sequence-to-sequence". "s2s" tends to get lost in the content. --------------------------------------------------------------------------- --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Reproducibility: 4 Ethical Concerns: Yes Ethics Justification --------------------------------------------------------------------------- The authors present an approach for removing biases from text, and even point out (correctly) that some biases may be implicit (such as "basketball" being strongly correlated with texts written by men). The authors then proceed to train a classifier capable of identifying these biases in text following existing work (Jain et. al., 2020). The authors do move forward, eventually showing how to build a system capable of removing these biases. But if the authors consider "biased text" to be a problem worth solving, it is then only fair for the authors to answer whether it is okay that such system requires a system for identifying sensitive information to begin with. An unscrupulous scientist could use this very paper as a blueprint for identifying minorities based on their writings in a way that would be hard to prove. The authors include an ethics statement, but they do not mention this issue. --------------------------------------------------------------------------- --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Overall Recommendation - Short Paper: 4 ============================================================================ REVIEWER #4 ============================================================================ What is this paper about, what contributions does it make, and what are the main strengths and weaknesses? --------------------------------------------------------------------------- Following advice from the scientific reviewers, the ethics committee would like to suggest that the authors extend their paper with information about potential misuse of the findings, to help delimit responsible use and identify problematic use. --------------------------------------------------------------------------- Reasons to accept --------------------------------------------------------------------------- This is an ethics review, so I am ignoring this field. --------------------------------------------------------------------------- Reasons to reject --------------------------------------------------------------------------- This is an ethics review, so I am ignoring this field. --------------------------------------------------------------------------- Questions for the Author(s) --------------------------------------------------------------------------- This is an ethics review, so I am ignoring this field. --------------------------------------------------------------------------- Missing References --------------------------------------------------------------------------- This is an ethics review, so I am ignoring this field. --------------------------------------------------------------------------- Typos, Grammar, Style, and Presentation Improvements --------------------------------------------------------------------------- This is an ethics review, so I am ignoring this field. --------------------------------------------------------------------------- --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Reproducibility: 5 Ethical Concerns: Yes Ethics Justification --------------------------------------------------------------------------- This is an ethics review, so I am ignoring this field. --------------------------------------------------------------------------- --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Overall Recommendation - Short Paper: 5