Purdue University Graduate School
Browse

Machine Learning Approaches for Speech Forensics

Download (20.74 MB)
thesis
posted on 2024-10-31, 12:26 authored by Amit Kumar Singh YadavAmit Kumar Singh Yadav

Several incidents report misuse of synthetic speech for impersonation attacks, spreading misinformation, and supporting financial frauds. To counter such misuse, this dissertation focuses on developing methods for speech forensics. First, we present a method to detect compressed synthetic speech. The method uses comparatively 33 times less information from compressed bit stream than used by existing methods and achieve high performance. Second, we present a transformer neural network method that uses 2D spectral representation of speech signals to detect synthetic speech. The method shows high performance on detecting both compressed and uncompressed synthetic speech. Third, we present a method using an interpretable machine learning approach known as disentangled representation learning for synthetic speech detection. Fourth, we present a method for synthetic speech attribution. It identifies the source of a speech signal. If the speech is spoken by a human, we classify it as authentic/bona fide. If the speech signal is synthetic, we identify the generation method used to create it. We examine both closed-set and open-set attribution scenarios. In a closed-set scenario, we evaluate our approach only on the speech generation methods present in the training set. In an open-set scenario, we also evaluate on methods which are not present in the training set. Fifth, we propose a multi-domain method for synthetic speech localization. It processes multi-domain features obtained from a transformer using a ResNet-style MLP. We show that with relatively less number of parameters, the proposed method performs better than existing methods. Finally, we present a new direction of research in speech forensics i.e., bias and fairness of synthetic speech detectors. By bias, we refer to an action in which a detector unfairly targets a specific demographic group of individuals and falsely labels their bona fide speech as synthetic. We show that existing synthetic speech detectors are gender, age and accent biased. They also have bias against bona fide speech from people with speech impairments such as stuttering. We propose a set of augmentations that simulate stuttering in speech. We show that synthetic speech detectors trained with proposed augmentation have less bias relative to detector trained without it.

Funding

A DATA-DRIVEN INTEGRATED APPROACH FOR SEMANTIC INCONSISTENCIES VERIFICATION (DARPA SEMAFOR)

United States Department of the Air Force

Find out more...

History

Degree Type

  • Doctor of Philosophy

Department

  • Electrical and Computer Engineering

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Edward J. Delp

Additional Committee Member 2

Amy R. Reibman

Additional Committee Member 3

Fengqing M. Zhu

Additional Committee Member 4

Mary L. Comer