Defending against Adversarial Attacks in Speaker Verification Systems
With the advance of the technologies of Internet of things, smart devices or virtual personal assistants at home, such as Google Assistant, Apple Siri, and Amazon Alexa, have been widely used to control and access different objects like door lock, blobs, air conditioner, and even bank accounts, which makes our life convenient. Because of its ease for operations, voice control becomes a main interface between users and these smart devices. To make voice control more secure, speaker verification systems have been researched to apply human voice as biometrics to accurately identify a legitimate user and avoid the illegal access. In recent studies, however, it has been shown that speaker verification systems are vulnerable to different security attacks such as replay, voice cloning, and adversarial attacks. Among all attacks, adversarial attacks are the most dangerous and very challenging to defend. Currently, there is no known method that can effectively defend against such an attack in speaker verification systems.
The goal of this project is to design and implement a defense system that is simple, light-weight, and effectively against adversarial attacks for speaker verification. To achieve this goal, we study the audio samples from adversarial attacks in both the time domain and the Mel spectrogram, and find that the generated adversarial audio is simply a clean illegal audio with small perturbations that are similar to white noises, but well-designed to fool speaker verification. Our intuition is that if these perturbations can be removed or modified, adversarial attacks can potentially loss the attacking ability. Therefore, we propose to add a plugin-function module to preprocess the input audio before it is fed into the verification system. As a first attempt, we study two opposite plugin functions: denoising that attempts to remove or reduce perturbations and noise-adding that adds small Gaussian noises to an input audio. We show through experiments that both methods can significantly degrade the performance of a state-of-the-art adversarial attack. Specifically, it is shown that denoising and noise-adding can reduce the targeted attack success rate of the attack from 100% to only 56% and 5.2%, respectively. Moreover, noise-adding can slow down the attack 25 times in speed and has a minor effect on the normal operations of a speaker verification system. Therefore, we believe that noise-adding can be applied to any speaker verification system against adversarial attacks. To the best of our knowledge, this is the first attempt in applying the noise-adding method to defend against adversarial attacks in speaker verification systems.