************************************************** Conditional accept Review #352A =========================================================================== Review recommendation --------------------- 4. Minor revision Reviewer expertise ------------------ 3. Knowledgeable Overall merit ------------- 3. Top 25% but not top 10% of submitted papers Writing quality --------------- 4. Well-written Paper summary ------------- The paper proposes WaveGuard, a defense framework that aims to detect adversarial examples constructed by four recent audio adversarial attacks. The framework uses different audio transformation functions for detection. The authors conducted experiments to demonstrate the effectiveness of WaveGuard and show it can withstand adaptive attackers. Strengths --------- The paper is easy to well-organized and well-written. The experiment demonstrates the effectiveness of the framework and analyzes how the hyperparameters affect the transformation functions. The proposed framework is efficient (average wall-clock time is low across all input transformation methods). Weaknesses ---------- Most of the weaknesses I pointed out in the last round are addressed. Comments for author ------------------- Overall, I think the authors address the requested changes well. The evaluations in the current version are sound. Specifically, the author implements a differentiable version of LPC and uses this to attack the proposed defense, which makes the results look more convincing. The author also updates the results of adaptive attacks with different parameter settings. In addition, the threshold calculation procedure and the evaluation of transfer attacks are included in the appendix. Last but not least, the open-source implementations are also available. Judging the paper as it is, I vote for an acceptance. * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Review #352B =========================================================================== Review recommendation --------------------- 5. Accept Reviewer expertise ------------------ 4. Expert Overall merit ------------- 3. Top 25% but not top 10% of submitted papers Writing quality --------------- 4. Well-written Paper summary ------------- This paper introduces WaveGuard, a technique to detect and prevent audio adversarial examples from occurring. The technique proposes several preprocessors g(x) and then detects whether or not an input is adversarial by comparing the classification f(x) with the classification of f(g(x)). The detection is highly accurate, and prevents most attacks using LPC and Mel Inversion. Strengths --------- + The proposed technique is interesting and on an important problem. + The paper performs adaptive attacks to try and break the proposed defense proposals, and indeed breaks many of their weaker proposals. + The writing quality is high quality, with all details explaind. Weaknesses ---------- - I still don't believe the claimed results, but at least they're within the realm of possibility. Comments for author ------------------- I was reviewer C last time. Beginning with a direct response to the author's response: 1. The ROC curve in Appendix A is the ROC curve for the *non-adaptive* attack. This is uninteresting. Security papers do not care about how a system performs when the adversary is unaware of the defense, only the adaptive ROC curve matters. The paper does not present this. 2. These constants help understand the defense. 3. Longer discussion below. 4. Thank you for this analysis. 5. It is wonderful to have code (especially now) however there appear to be some issues with it. I will not hold this against the paper because any code is better than no code, and I wouldn't have held no code against the paper at submission time. - On my machine, running the jupyter notebook fails in that the output of the audio wavefile is always 0. This is caused because the method voiced_unvoiced always returns [0,0,0...]. I don't know why this is. - It's great to show that f_tf(x) == f_np(x), but that's kind of just the bare minimum. It would also be good to know that the gradients are correct, for example by numerically performing finite differences on f_np and comparing to f_tf. -- Longer discussion around comment 3: The paper is much improved with respect to its evaluation. The prior paper argued robustness that was so strong as to be impossible, and with the new and improved attacks the paper claims much weaker results. For example, the prior paper claimed 100% AUC at a distortion bound of 16,000 whereas the new revision claims 77% AUC at 1,000 distortion. The authors do note that this result still is significantly stronger than anything published previously in the literature for defenses to audio adversarial examples. However the results are now somewhat more messy. The prior paper was trying to make the case that LPC was a good defense and is what should be used; the new paper advocates instead for Mel Extraction---or, rather, it should, because that's what gives stronger results. * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Review #352C =========================================================================== Review recommendation --------------------- 4. Minor revision Reviewer expertise ------------------ 2. Some familiarity Overall merit ------------- 4. Top 10% but not top 5% of submitted papers Writing quality --------------- 4. Well-written Paper summary ------------- The paper proposes a framework, WaveGuard, for defending automatic speech recognition systems (ASRs) from audio adversarial examples. The authors explore different audio transformation functions and analyze the ASR transcriptions of the original and transformed audio to detect adversarial inputs. Through comprehensive experiments, the authors demonstrate that the defense framework can successfully flag adversarial examples with high detection accuracy, even in the presence of adaptive attacks. Strengths --------- 1) The paper is very well-written and easy to follow. 2) Good detection accuracy against different types of attacks. 3) Experiments were performed against both static and adaptive attacks. 4) Mel extraction-inversion and LPC show great promises. Weaknesses ---------- 1) The idea of comparing the transcriptions of the original and transformed audio is not novel. Comments for author ------------------- This is a good submission and on an important problem. The authors did an excellent job of presenting the topic in an easy to understand way. The quality of the writing is outstanding. The threat model considered in this paper is reasonably strong. The results show that the proposed defense framework has pretty high detection accuracy when detecting adversarial inputs against four recent attacks on state-of-the-art automatic speech recognition systems. One of this paper's strengths is that it also considers adaptive attacks, where the attacker knows the defense method being used. The authors show that such an attack can potentially break many defense techniques. This shows it very difficult to defend ASRs from adversarial attacks under an adaptive attack. However, the experiments with Mel extraction-inversion and LPC show significant promises. I think it will benefit the community if the authors open source the complete code for the paper. * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Review #352D =========================================================================== Review recommendation --------------------- 5. Accept Reviewer expertise ------------------ 3. Knowledgeable Overall merit ------------- 4. Top 10% but not top 5% of submitted papers Writing quality --------------- 4. Well-written Paper summary ------------- N.B this is major rev, the summary is mostly the same as previous review. The paper presents WaveGuard, a framework for defending speech recognition systems from adversarial inputs. Multiple concrete implementations of techniques that conform to the proposed framework are discussed, and each of these then evaluated as a potential defence. Following this, an adaptive attack is considered against the defence, where the attacker knows the defence being applied, including the parameters being used. This evaluation shows that for several of the functions this knowledge degrades the defence performance significantly, as the inclusion of the defence in the loss function allows an attacker to work around this. However promising results are obtained for the Mel Extraction and Inversion and the LPC techniques, which both require high levels of distortion to break the defence. The paper finished with a discussion of the results and a conclusion. Strengths --------- - Comparison of techniques is interesting - Has completed the major revision requests - Expanded the LPC analysis to find its limit. Weaknesses ---------- - Would be good to include more detail on specific versions of deepspeech and Lingvo used, hyperparameters etc. although I assume these will be apparent in the code release. Comments for author ------------------- Some suggestions to improve the paper: * Consider adding a "final model" of WaveGuard to Section 9, detailing which classifiers you envision being used, the performance costs of using this model, and how the output of WaveGuard might be interpreted in practice. * Clarify the attacker strategy / mindset behind the adaptive attack algorithm presented in section 7.2 and why this particular algorithm is representative. * Either expand the number of targeted adversarial commands by several orders of magnitude, or add suitable justification for why doing so is not necessary * What is the expected behavior of a speech recognition system which incorporates WaveGuard in the presence of an adversarial attacker? Does it refuse to answer a query, attempt to extract and respond to the benign original query, or do something else altogether? Can you help explain your presented metrics in terms of how successful the tool is at this chosen strategy and how much longer it takes to respond to a query vs. not validating the query at all? * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Review #352E =========================================================================== Review recommendation --------------------- 2. Reject and resubmit Reviewer expertise ------------------ 2. Some familiarity Overall merit ------------- 2. Top 50% but not top 25% of submitted papers Writing quality --------------- 3. Adequate Paper summary ------------- (See summary from first submission.) Strengths --------- (See summary from first submission.) Weaknesses ---------- (See summary from first submission.) Comments for author ------------------- My comments will specifically address the suggested changes for the major revision. 1. Evaluate the effect of the defense on clean accuracy. What is the true positive rate at low false positive rates? I don't see this addressed in the revision. I would have expected a table or discussion that describes the CERs when benign samples have been transformed, along with the false positive rates for each transform (e.g., how often a benign sample is flagged as malicious). This should be provided in addition to the AUC scores - I would like to see actual true and false positive rates. Appendix A provides the ROC curves for the static attacks (Table 2), but does not provide the ROC curves for the adaptive attacks (Table 4). 2. Implement a differentiable version of LPC and use this to attack the proposed defense. If LPC can not be implemented end-to-end differentiable, then create a function that approximates it and use this approximation. Addressed in Section 8 (and the new values in Table 4). (Although it's unclear why the distortion metrics were changed across the board. It is also interesting that LPC does not perform as well as indicated in the original submission.) 3. Run extra sanity checks to confirm the attack isn't flawed. For example, here are three common approaches: (a) Evaluate the attack success rate when the perturbation magnitude is equal to the maximum loudness of the audio sample. (For example, look into why attacks with a maximum of 2.1dB fail.) If the AUC is greater than 0.5 then something is wrong. (b) Attempt black-box attacks using gradient estimation. (c) Attempt transfer attacks from the undefended model. Authors used approach (c) and provided the results in Appendix C. 4. The above evaluation may require more space than is available currently; removing the evaluation of the ideas that prior work has already shown don't work (or deferring to an appendix) would give this space back. The suggestion here was to move some of the evaluation of ideas with known or expected results to the appendix so that the evaluation of new approaches could be more completely addressed in the paper. At the very least, the information in the appendix should be referenced in th main text as being available. 5. The authors are encouraged to release source code with the final paper. The authors indicate they will do this. My final comment is that I wish that the authors spent more of the text calling out the results from Mel extraction and LPC, both of which look promising, as this seems to me to be the true result from the paper. (But the authors have not changed much of the text prior to Section 7.) Additionally, the authors should state in the paper that they are following the practices from [25] in choosing 4 adversarial targets. Given that the commands were never run against an ASR, but only a speech-to-text algorithm, I do wish that the authors had used random phrases selected from Mozilla as that would demonstrate the efficacy of their approach in the more general case. (Aside: I was reviewer F in the previous round.) * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Review #352F =========================================================================== Review recommendation --------------------- 5. Accept Reviewer expertise ------------------ 2. Some familiarity Overall merit ------------- 4. Top 10% but not top 5% of submitted papers Writing quality --------------- 4. Well-written Paper summary ------------- The paper proposes WaveGuard, a framework for detecting adversarial inputs that are crafted to attack ASR systems. The framework relies on input transformation functions and analyzes the transcriptions of original and transformed audio to label the input as adversarial or benign. The paper also shows that Linear Predictive Coding (LPC) and Mel spectrogram extraction-inversion are more robust to adaptive attacks as compared to other transformation functions. The paper also evaluates the framework against adaptive attacks. Strengths --------- - The approach addresses adversarial attacks in the audio domain which is crucial. - It also takes into account protection against adaptive attacks without assuming a “zero-knowledge” threat model. - The evaluation shows which optimal compression levels for each input transformation defense. Comments for author ------------------- Probably mentioned somewhere in the paper, but it's helpful if the paper provides an overview of the available set to input transformations that can be used for defending purposes. Provide a justification of the selected 5 input transformations. ************************************************** USENIX Fall Review #352B =========================================================================== * Updated: 19 Jan 2021 10:59:12am PST Review recommendation --------------------- 2. Reject and resubmit Reviewer expertise ------------------ 4. Expert Overall merit ------------- 2. Top 50% but not top 25% of submitted papers Writing quality --------------- 4. Well-written Paper summary ------------- This paper introduces WaveGuard, a technique to detect and prevent audio adversarial examples from occurring. The technique proposes several preprocessors g(x) and then detects whether or not an input is adversarial by comparing the classification f(x) with the classification of f(g(x)). The detection is highly accurate, and prevents most attacks using LPC and Mel Inversion. Strengths --------- + The proposed technique is interesting and on an important problem. + The paper performs adaptive attacks to try and break the proposed defense proposals, and indeed breaks many of their weaker proposals. + The writing quality is high quality, with all details explained. Weaknesses ---------- - The prior paper's main claim, that LPC was effective, is no longer supported. It looks like now Mel is the stronger claim, but Mel was less thoroughly evaluated. As a result, I worry that if Mel is given the same evaluation treatment to LPC then the results here will drop as well. Comments for author ------------------- I was reviewer C last time. Beginning with a direct response to the author's response: 1. The ROC curve in Appendix A is the ROC curve for the *non-adaptive* attack. This is uninteresting. Security papers do not care about how a system performs when the adversary is unaware of the defense, only the adaptive ROC curve matters. The paper does not present this. 2. These constants help understand the defense. 3. Longer discussion below. 4. Thank you for this analysis. 5. It is wonderful to have code (especially now) however there appear to be some issues with it. I will not hold this against the paper because any code is better than no code, and I wouldn't have held no code against the paper at submission time. - On my machine, running the jupyter notebook fails in that the output of the audio wavefile is always 0. This is caused because the method voiced_unvoiced always returns [0,0,0...]. I don't know why this is. - It's great to show that f_tf(x) == f_np(x), but that's kind of just the bare minimum. It would also be good to know that the gradients are correct, for example by numerically performing finite differences on f_np and comparing to f_tf. -- Longer discussion around comment 3: The paper is much improved with respect to its evaluation. The prior paper argued robustness that was so strong as to be impossible, and with the new and improved attacks the paper claims much weaker results. For example, the prior paper claimed 100% AUC at a distortion bound of 16,000 whereas the new revision claims 77% AUC at 1,000 distortion. The authors do note that this result still is significantly stronger than anything published previously in the literature for defenses to audio adversarial examples. However the results are now somewhat more messy. The prior paper was trying to make the case that LPC was a good defense and is what should be used; the new paper advocates instead for Mel Extraction---or, rather, it should, because that's what gives stronger results. Reviewer feedback on authors' response and online discussion ------------------------------------------------------------ Given the final ROC curves for the adaptive attack, it no longer looks like the main claims of the original paper are supported. The first submitted version of the paper argued that LPC was a (very) robust defense. With the new results, it looks like Mel is now the stronger stronger of the defenses, and LPC doesn't work as well. However, Mel wasn't as throughly evaluated as LPC was, and so it's unclear if this robustness claim will hold true. This could still be an interesting paper, however the main claims from the original paper no longer hold true. The focus of the paper must now shift to the alternate method which empirically works better, and this is beyond the scope of a revision. ************************************************** USENIX Summer: Review #62A =========================================================================== Review recommendation --------------------- 4. Minor revision Reviewer expertise ------------------ 3. Knowledgeable Overall merit ------------- 3. Top 25% but not top 10% of submitted papers Writing quality --------------- 4. Well-written Paper summary ------------- This paper proposes a transformation-based method for detecting audio adversarial examples on automatic speech recognition systems. The authors explore five different audio transformations and compare the character word error (CER) between the clean and transformed phrase outputs, hypothesizing that adversarial clips will have greater error. The authors conclude that WaveGuard can successfully detect adversarial examples, even in the presence of adaptive attacks. Strengths --------- - Tests adaptive attacks - Good detection accuracy across a variety of attacks - Good comparisons between different transformations and attack variations Weaknesses ---------- - Lacks direct comparison to existing detection techniques - Can only detect adversarial examples, and can’t necessarily correct them Comments for author ------------------- Overall, this is a good submission. The authors do a good job presenting their research, which provides a detection-level defense for audio adversarial examples. The authors test five input transformations: quantization-dequantization, down-sampling and up-sampling, filtering, Mel spectrogram extraction-inversion, and linear predictive coding (LPC). The intuition is that attacked audio clips will display a wider transcription difference on un-transformed and transformed inputs than clean audio clips will. One strength is the variety of attacks that the defense is evaluated on. The defense is evaluated across untargeted universal attacks and targeted attacks, including Carlini and Wagner’s DeepSpeech attack and Qin et al’s robust and imperceptible attacks on Lingvo. Adaptive attacks are tested with a straight through estimator in the BPDA approach for 4 out of the 5 transformations, with a differentiable version implemented for the Mel spectrogram defense. The authors find that their approach succeeds with good success in each of these settings. The authors also provide a variety of measurements, including deltas and dB for measuring distortions and success rates and CERs for attack performance in Table 4, a comparison of CERs across the various transformations and attacks for un-transformed vs transformed vs clean vs adversarial inputs in Figure 7 to provide context to their attack, and timing information in Table 3. This helps provide some nice context and intuition to their results. In particular, Figure 7 helps display the differences between adversarial and clean examples that can be used to detect attacks. Since the conclusion notes state-of-the-art detection performance, I would like to see some direct comparison of this system with existing detection techniques. While the authors defend that WaveGuard achieves good performance (with 100% detection accuracy for the LPC transformation and other high numbers on other transformations), without direct comparison it is harder to note the amount of improvement made over state-of-the-art techniques. It would also be nice to add some citations in the LPC section – that LPC models the human vocal tract system becomes important to the intuition and explanation to the best performing defense here, but it is not entirely clear why or how this is the case. As the authors note, it could be interesting to explore if stronger adaptive attacks for the other transformations could be made, as currently only the Mel Extraction transformation uses something other than the straight through estimator. The paper is also framed well in terms of taking inspiration from image processing techniques (e.g. JPEG compression) and stating where they succeed in audio where image-based techniques have failed. * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Review #62B =========================================================================== Review recommendation --------------------- 4. Minor revision Reviewer expertise ------------------ 4. Expert Overall merit ------------- 4. Top 10% but not top 5% of submitted papers Writing quality --------------- 5. Outstanding Paper summary ------------- The paper presents WaveGuard, a framework for defending speech recognition systems from adversarial inputs. The paper initially has a good background section covering adversarial attacks in the audio domain, as well as similar defences that have been devised for the image domain. Following this a reasonable threat model is given, and the defence framework proposed. Multiple concrete implementations of techniques that conform to the framework are then discussed, and each of these then evaluated as a potential defence. Following this, an adaptive attack is considered against the defence, where the attacker knows the defence being applied, including the parameters being used. This evaluation shows that for several of the functions this knowledge degrades the defence performance significantly, as the inclusion of the defence in the loss function allows an attacker to work around this. However promising results are obtained for the Mel Extraction and Inversion and the LPC techniques. The amount of distortion required to degrade performance is high for Mel, and no success is obtained under both bounds considered for LPC. The paper finished with a discussion of the results and a conclusion. Strengths --------- - Paper is very well written, being easy to understand and clear to follow - The problem has been carefully considered, and the framework produced clearly solves it - Presenting multiple implementations and comparing them is interesting, and helps to better understand what makes a good function for use in the framework. - Inclusion of adaptive attacker is good, demonstrating that careful consideration has been given to experimental design and learning from other papers that have not done this. - The explanations contained in the paper for varying results are good and informative. Weaknesses ---------- - Evaluation could be more clear - Limitations of the system should be discussed in more detail Comments for author ------------------- - The mentioning of a separate set being used for determining the detection threshold is only mentioned once and a tail of a paragraph (section 5.2). This point needs expanding to detail how the set of 50 examples is produced and the makeup of the attacks contained within it. - Similarly in the adaptive attack evaluation it is not clear that the value of t is derived from the same original set of 50 (assuming this is the case), and that the threshold does not need adjusting now that the attacker is using a different technique. If the value of t is changed for the new technique then this needs explaining, especially as this would imply the system requires prior knowledge of the AE it is likely to face if it is to defend against it. - There is no study of whether the system can be applied to types of AE that are not used in the training data. This transferability of the defence is important to know, and seems likely to be successful in this system. This is an important real world consideration for use of the system, and could be tested by implementing another AE attack type and evaluating the system against it. - Perhaps worth considering to release source code for experiments and defense. Releasing associated code is best practice, and is likely to increase the impact of a paper. Requested Changes ----------------- Would the defence also work against hidden voice attacks such as those presented in 'Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems'- Abdullah et al.? - Clarify how the set for determining the threshold t is constructed. - Clarify whether the same set is used for determining the threshold for each of the implementations of the framework (perhaps even provide the values of t for each) - Clarify if the same threshold is used for adaptive attack as was used for the non-adaptive experiments. * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Review #62C =========================================================================== Review recommendation --------------------- 3. Major revision Reviewer expertise ------------------ 4. Expert Overall merit ------------- 1. Bottom 50% of submitted papers Writing quality --------------- 4. Well-written Paper summary ------------- This paper introduces WaveGuard, a technique to detect and prevent audio adversarial examples from occurring. The technique proposes several preprocessors g(x) and then detects whether or not an input is adversarial by comparing the classification f(x) with the classification of f(g(x)). The detection is almost perfect, and prevents nearly all attacks when using the strongest LPC models. Strengths --------- + The proposed technique is interesting and on an important problem. + The paper performs adaptive attacks to try and break the proposed defense proposals, and indeed breaks many of their weaker proposals. + The writing quality is high quality, with all details explained. Weaknesses ---------- - The proposed technique comparing f(x) with f(g(x)) is not new. - The defense can not possibly be as strong as is claimed, indicating the attack is incorrectly performed. The paper claims robustness against an adversary has a distortion bound of 2.1dB. This means that the magnitude of the perturbation is *larger* than the magnitude of the original audio sample, and so it's impossible for the detection scheme to work at an AUC of 1.0. Comments for author ------------------- This review focuses exclusively on the evaluation that is performed of the proposed defense. The paper evaluates the defense by constructing a loss function with three moving pieces: the 2-norm of the perturbation, the the CTC loss of the transcript, and the loss on the detector. The paper correctly applies two different constants to weight these terms, but does not explain how the constants here are selected. Algorithm 1 describes an incorrect adaptive attack: the test on Line 9 *MUST* include a check that not only does the classification match the target, but the detector has been fooled. Without doing this, there is no reason to minimize the combined weighted loss function because it will never actually come into play, except to reduce the attack success rate. The proposal to apply BPDA whenever the gradients become ugly is not well motivated. As the authors of [40] say, "BPDA should not be treated as a general-purpose method for minimizing through arbitrarily non-differentiable functions. Rather, loss functions must be designed to work with BPDA and must already be almost differentiable." In particular, BPDA *only* works when the critical assumption that g(x) \sim x holds true, however the authors do not actually validate that this is the case. Further, applying BPDA tends to make difficult-to-optimize formulations even more difficult. The paper would be greatly improved by first trying to develop a differentiable version of the LPC defense and attacking that, or justifying why this is impossible to do. (That it would be difficult to implement would not be a valid justification, as an attacker who truly wanted to break the scheme would do so regardless of annoyance.) In particular, it's worrying that in Table 1 there are examples where the attack fails to succeed (the success rate is just 10%) and also the AUC detection rate is 1.0. It should be trivial to at least succeed 100% of the time at an AUC detection rate of 1.0---by just using the unmodified attack. There isn't a total order to the (SR, AUC) tuple, but there is a partial order, and this attack (10%, 1.0) is strictly worse than (100%, 1.0). Things only get worse from here. In the final row of Table 4, the paper claims that with a distortion of 2.1dB, the attack still has only a 10% success rate and again an AUC of 1.0. However: a 2.1dB signal-to-noise ratio implies that the noise is *louder* than the signal, and therefore the attack could simply completely wipe out the original signal and replace it completely with a quiet desired adversarial transcript---or at the very least not be detected. It may be a useful to try and run the attack with a distortion bound of four times as large---64,000---and see if that works. If that attack doesn't succeed, then this would prove that the proposed attack is broken, because at 64,000 it should always be possible to turn any waveform into any other waveform. Given that the attack success rate does not go up from 2,000 to 16,000, I suspect that it won't go up when going to 64,000 as well. (If it works, this would not necessarily prove that the attack is right. It's like running a randomness test on a stream cipher. If there's any statistical regularity, it's definitely broken, but the converse does not imply anything useful.) Things take one final turn for the worse: the universal and non-adaptive attack in Table 2 performs *better* than the adaptive attack proposed here, even at a distortion bound of 16,000. The AUC of that universal attack is just 0.91. Taken together, these indicate that the adaptive evaluation is performed incorrectly, and the defense has not been properly evaluated. (Finally, and this is just a minor comment on the writing, it's unclear why the paper, just sentences after acknowledging that JPEG compression and other similar image processing defense have been completely broken, goes on to say that it's inspired by JPEG compression as a defense idea.) Requested Changes ----------------- Adjust the adaptive attack to evaluate the defense. * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Review #62D =========================================================================== Review recommendation --------------------- 3. Major revision Reviewer expertise ------------------ 2. Some familiarity Overall merit ------------- 2. Top 50% but not top 25% of submitted papers Writing quality --------------- 3. Adequate Paper summary ------------- The paper proposes a framework for detecting audio adversarial inputs targeting deep neural network based ASR models. Strengths --------- 1. The paper is easy to follow. I like to the presentation. 2. The detection accuracy is pretty good, especially against a non-adaptive attack. Weaknesses ---------- 1. Most of the experiments in this paper are quite redundant. Prior work already showed similar findings. 2. All the attacks and defenses are evaluated mostly against a single ASR model (DeepSpeech). 3. Missing many details of the experiments. Comments for author ------------------- First, Yang et al. [ICLR’19] already showed that adversarial examples are not robust under audio input transformations such as quantization, filtering, down-sampling, etc. However, input transformation as a defense does not work well against an adaptive attack. This paper contributes a great deal of space conducting a similar experiment, which draws similar conclusions as Yang et al. that an advanced and adaptive adversary can bypass simple input transformation based defenses. As such, the overall contribution of this submission is rather limited and lacks novelty. That said, the findings from the Mel spectrogram extraction/inversion and LPC experiment give some interesting and useful insights. Second, while evaluating against targeted attacks, only a handful of phrases have been chosen as target transcriptions. Further, the authors tested the performance of the detector against a small number of adversarial samples. As a result, the evaluation process does not look rigorous enough. Third, the authors tested most of their defenses only against the DeepSpeech ASR model. While this might be related to the fact that most audio AE attacks in the literature are performed against DeepSpeech, it would have been interesting if authors had evaluated the same defenses against other ASR models such as Wav2letter, and see how detection performance vary between these ASR models. Also, the paper does not provide any details of the decoding method --- whether they used a greedy decoder or beam search decoder (with/without a language model) to get the output from the network. It is critical because WER/CER can vary significantly depending on the decoding mechanisms being used to extract the final transcription for the audio input. Similarly, the paper is missing many important details regarding the setup of the experiments. For example, it is not clear what exact hyper-parameters were used to generate the audio AEs for the four attacks being discussed. * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Review #62E =========================================================================== Review recommendation --------------------- 4. Minor revision Reviewer expertise ------------------ 2. Some familiarity Overall merit ------------- 3. Top 25% but not top 10% of submitted papers Writing quality --------------- 4. Well-written Paper summary ------------- The paper proposes WaveGuard, a defense framework that aims to detect adversarial examples constructed by four recent audio adversarial attacks. The framework uses different audio transformation functions for detection. The authors conducted experiments to demonstrate the effectiveness of WaveGuard and show it can withstand adaptive attackers. Strengths --------- The paper is easy to well-organized and well-written. The experiment demonstrates the effectiveness of the framework and analyzes how the hyperparameters affect the transformation functions. The proposed framework is efficient (average wall-clock time is low across all input transformation methods). Weaknesses ---------- The gradient estimation methods in the adaptive attack might not be accurate enough to create strong adversarial examples. In the adaptive attacker experiment, some experiments are missing. Comments for author ------------------- Overall, I think the well-written and the methods are easy to understand. The experiments are well conducted. Especially, I like the analysis of how the hyperparameters affect the transformation functions. My biggest concern is the accuracy of the gradient estimation methods in the adaptive attacker experiments. If the gradient estimation methods are not accurate enough (especially in Mel Extraction-Inversion and LPC), then the experiment results are not convincing. Also, in the adaptive attacker experiment, some experiments are missing: I would like to see how different hyperparameters (e.g., LPC Order in Mel Extraction-Inversion and Number of Mel Bins in LPC) affect the performance of different methods against adaptive adversaries. Other questions: I have questions about dBx(\delta) in Table 4: how does the dBx(\delta) related to attack performance and detection scores? In Figure 6 (b), Could you please explain the trend of AUC scores with the increase of the down-sampling rate? How many times do you run your experiments? Do the results show a large variance? Requested Changes ----------------- Add experiments on how different hyperparameters affect the performance of different input transformation methods against adaptive adversaries. Questions for authors' response ------------------------------- Please refer to “section I. Comments for author”. * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Review #62F =========================================================================== Review recommendation --------------------- 3. Major revision Reviewer expertise ------------------ 2. Some familiarity Overall merit ------------- 2. Top 50% but not top 25% of submitted papers Writing quality --------------- 2. Needs improvement Paper summary ------------- The authors examined several transformation functions on audio inputs to determine what approach works best as a defense against attacks on automatic speech recognition systems (ASRs). The authors examined several attacks, including adaptive attacks that are aware of the defense in place. The results demonstrate that using LPC (linear predictive coding) works well as a defense against four different attacks. Strengths --------- + LPC filtering shows significant promise + Testing performed against both static and adaptive attacks Weaknesses ---------- - Experimental design choices are not well justified - Generalizability of results not tested or discussed Comments for author ------------------- The authors have what I think is an interesting result in the paper, however it seems to be buried in the paper and not well examined. For example, the authors claim in the introduction that their defense framework is a significant contribution, yet I'm not convinced that putting a filter in front of an ASR is a signficant contribution. In contrast, the last contribution simply states that they investigage transformation functions and find some work well, whereas their results show that it is extremely difficult, based on their tests, to bypass an LPC transformation with an attack. I recommend that the authors call out their results more clearly, and focus more on examining the results in their paper. My main concerns center around the experimental design, and the lack of descriptions provided for some of the choices made First, the authors choose four adversarial commands. Why only four commands? Why these specific four commands? Can the authors justify that these four commands are representative of the attack space? Or, if the authors are more generically demonstrating that they can extract the noise intended to change a text from A to B, then the text B can be more generic, such as choosing A from the set of 100 examples in Mozilla Common Voice, and then choosing B from the remaining 99 examples. The authors need to justify if they are focusing on attacks, and thus use representative attacks, or that their approach is generic, and thus use generic commands. As an aside here, it is also unclear if the same attack pairs were used for each transformation, or if new combinations were created. It should be the same attack pairs for each transformation in order to ensure an apples-to- apples comparison. In terms of the evaluation metrics (section 5.2), the threshold was calculated on "a separate set containing 50 adversarial and benign examples". How separate was this set? Ideally, the 50 benign examples should not have been selected from the same set as the test set (e.g., from the first 100 examples in Common Voice). Similarly, the 50 adversarial examples should be separate from the four examples used in the testing (provided in Table 1). By testing the effectiveness of your transformations on essentially previously unseen data, you provide a more accurate representation of how your approaches might work in a deployment scenario. Also, the thresholds used for each defense should be provided (e.g., in Table 2). Related to the evaluation metrics, I would like to see more discussion justifying the use of character error rates. This relates back to if this approach is generic, or just for attacks. If the latter, then the real goal is to demonstrate that the adversarial attack failed, even if the CER might be low (e.g., the difference between "open my bank account" and "empty my bank account"). If the former, then there is a stronger justification for using CER. In Section 4.3, you state that you set the high-self frequency threshold at 1.5 x C and the low-shelf frequency threshold at 0.1 x C. Why did you choose these values? Each attack tested is targeted at only one ASR (either Lingvo or Deepspeech). It is unclear if there is a different threshold selected for each attack. The ideal experimental design would be to determine the threshold using one attack for Lingvo (Deepspeech) and then demonstrate generalizability by providing the results for the other attack on that same ASR. In the absolute ideal circumstance, at least LPC (since it performed the best) would be tested against a previously unseen ASR and attack in order to demonstrate generalizability. (Ultimately, the question seems to really be one of if the threshold calculated for one combination works for the other combinations. Instead, the authors seem to have chosen the optimal threshold for each combination, which provides the best results but does not indicate how a filtering approach might work in a deployment scenario with previously unseen attacks.) Finally, the authors mention attacks embedded in music early in the paper (when discussing background and related work). I was hoping to see the defense deployed against such attacks (e.g., CommanderSong). Intuitively, the LPC defense in particuar should be immune to that attack.