Imperceptible, Robust and Targeted Adversarial Examples for Automatic Speech Recognition

Abstract

Adversarial examples are inputs to machine learning models designed by an adversary to cause an incorrect output. In this work, we perform white-box attack to the state-of-the-art Lingvo automatic speech recognition (ASR) system in the LibriSpeech test dataset. First, we develop effectively imperceptible audio adversarial examples (verified through a human study) by leveraging the psychoacoustic principle of auditory masking, while retaining 100% targeted success rate on arbitrary full-sentence targets. Next, we make progress towards physical-world over-the-air audio adversarial examples by constructing perturbations which remain effective even after applying realistic simulated environmental distortions. The details of the algorithms can be found in our paper and the implementations can be found here.

Imperceptible Adversarial Examples

To construct imperceptible adversarial examples for automatic speech recognition system, we use frequency masking, which refers to the phenomenon that a louder signal can make other signals at nearby frequencies imperceptible. We display two sets of audio examples below. In each set, there is a clean audio, an adversarial example generated by Carlini’s method and our constructed imperceptible adversarial example. Listen to them carefully and choose which one is the clean audio.

First Set

[Reveal Transcription]
Clean audio: “The sight of you bartley to see you living and happy and successful can I never make you understand what that means to me”

[Reveal Transcription]
Carlini’s adversarial example: “Hers happened to be in the same frame too but she evidently didn’t care about that” ”

[Reveal Transcription]
Our imperceptible adversarial example: “Hers happened to be in the same frame too but she evidently didn’t care about that”

Second Set

[Reveal Transcription]
Carlini's adversarial example: “This was so sweet a lady sir and in some manner i do think she died” ”

[Reveal Transcription]
Our imperceptible adversarial example: “This was so sweet a lady sir and in some manner i do think she died”

[Reveal Transcription]
Clean audio: “And to think we can save all that misery and despair by the payment of a hundred and fifty dollars”

Robust Adversarial Examples

Carlini’s adversarial examples and our constructed imperceptible adversarial examples can not work while playing over-the-air. In order to improve the robustness of adversarial examples when playing over-the-air, we use the Image Source Method to create the room impulse responses based on the room configurations (e.g., the room dimension, source audio and target microphone’s location). Then we convolve the room impulse responses with the audio to create artificial utterances (speech with reverberations) that mimic playing the audio over-the-air. Here is an example of a clean audio and its corresponding simulated audio with room reverberation.

[Reveal Transcription]
Clean audio: “The more she is engaged in her proper duties the less leisure will she have for it even as an accomplishment and a recreation”

[Reveal Transcription]
Simulated clean audio with reverberation: “The more she is engaged in her proper duties the less leisure will she have for it even as an accomplishment and a recreation”

To make the generated adversarial examples robust to various environments, we consider a challenging setting that the exact configuration of the room in which the attack will be performed is unknown. Instead, we are only aware of the distribution from which the room configuration will be drawn. First, we generate 1000 random room configurations sampled from the distribution as the training room set. The test room set includes another 100 random room configurations sampled from the same distribution. The constructed robust adversarial examples can achieve over 60% attack success rate in the 100 test rooms. Below are two audio samples. One of them is the clean audio simulated playing over-the-air in one test room and another one is our constructed robust adversarial example simulated playing in the same test room. We can clearly hear the noise in the background in the robust adversarial example.

[Reveal Transcription]
Clean audio with reverberation: “Old dances are simplified of their yearning bleached by time”

[Reveal Transcription]
Robust adversarial example with reverberation: “You don't seem to realize the position”

Imperceptible & Robust Attacks

By combining both of the techniques we developed above, we can generate both imperceptible and robust adversarial examples, which can achieve around 50% attack success rate in 100 simulated test rooms. Here we display four sets. Each set includes three audio samples convolved with the same simulated room reverberation. One is the clean audio, the others are a robust adversarial example and an imperceptible and robust adversarial example. Listen to them carefully, you should be able to hear the obvious noise in the background in the robust adversarial example. The imperceptible and robust adversarial example is much less imperceptible compared to the robust adversarial example, but can still be differentiated from the clean audio.

First Set

[Reveal Transcription]
Clean audio: “It is so made that everywhere we feel the sense of punishment”

[Reveal Transcription]
Robust adversarial example: “Said missus horton a few minutes after”

[Reveal Transcription]
Imperceptible and robust adversarial example: “Said missus horton a few minutes after”

Second Set

[Reveal Transcription]
Robust adversarial example: “If spoken to she would not speak again”

[Reveal Transcription]
Clean audio: “Come and get the boolooroo she said going toward the benches”

[Reveal Transcription]
Imperceptible and robust adversarial example: “If spoken to she would not speak again”

Third Set

[Reveal Transcription]
Imperceptible and robust adversarial example: “I suppose that's the wet season too then”

[Reveal Transcription]
Clean audio: “Were i in the warm room with all the splendor and magnificence”

[Reveal Transcription]
Robust adversarial example: “I suppose that's the wet season too then”

Fourth Set

[Reveal Transcription]
Robust adversarial example: “A terrible thought flashed into my mind”

[Reveal Transcription]
Clean audio: “He's another who's awfully keen about her let me introduce you”

[Reveal Transcription]
Imperceptible and robust adversarial example: “A terrible thought flashed into my mind”

Imperceptible, Robust and Targeted
Adversarial Examples for Automatic Speech Recognition

Yao Qin¹, Nicholas Carlini², Ian Goodfellow², Garrison Cottrell¹, Colin Raffel²

¹University of California San Diego, ²Google Brain

Abstract

Imperceptible Adversarial Examples

Robust Adversarial Examples

Imperceptible & Robust Attacks

Imperceptible, Robust and Targeted Adversarial Examples for Automatic Speech Recognition

Yao Qin1, Nicholas Carlini2, Ian Goodfellow2, Garrison Cottrell1, Colin Raffel2

1University of California San Diego, 2Google Brain

Abstract

Imperceptible Adversarial Examples

Robust Adversarial Examples

Imperceptible & Robust Attacks

Imperceptible, Robust and Targeted
Adversarial Examples for Automatic Speech Recognition

Yao Qin¹, Nicholas Carlini², Ian Goodfellow², Garrison Cottrell¹, Colin Raffel²

¹University of California San Diego, ²Google Brain