Sounding the Alarm: AI-Based Speech Recognition for Abuse and Suicide Detection
The increasing prevalence of online abuse and rising concerns over adolescent suicide necessitate advanced, scalable solutions for early detection and intervention. Speech-based artificial intelligence (AI) models offer a promising approach to identifying harmful speech patterns in real time. This thesis explores two complementary contributions in this domain: (1) FAST (Fast Audio Spectrogram Transformer), a novel architecture for efficient and robust audio classification, and (2) a multimodal deep learning framework for suicide risk assessment using both speech and text data.
FAST is designed as a lightweight yet powerful model that integrates convolutional neural networks (CNNs) for local feature extraction with transformers for global context modeling. Inspired by the MobileViT framework, FAST enhances real-time processing efficiency while maintaining high classification accuracy. Additionally, the incorporation of Lipschitz continuous attention mechanisms stabilizes training and accelerates convergence. Evaluated on the ADIMA dataset for multilingual abuse detection and the broader AudioSet benchmark, FAST achieves state-of-the-art performance, surpassing existing models while utilizing up to 150 times fewer parameters.
Beyond audio classification, this thesis extends AI-driven speech analysis to suicide risk assessment through a multimodal learning framework. Recognizing that suicidal tendencies manifest through both speech patterns and linguistic cues, we leverage a fusion of audio and text features to improve detection accuracy. Our model, developed for the 1st SpeechWellness Challenge, employs a feature-engineered approach to capture both shared and unique representations across modalities. Experimental results show that this approach outperforms the baseline model by 9\%, achieving a 65\% accuracy rate on the development set. This highlights the potential of deep learning in augmenting traditional mental health screening methods.
By integrating insights from both audio classification and multimodal learning, this thesis discusses the broader applicability of AI-driven speech recognition for real-time risk detection. We analyze the commonalities and challenges across abuse detection and suicide risk assessment, proposing pathways for developing more generalized, ethically responsible, and scalable solutions. The findings underscore the potential of AI in mitigating harm through speech-based interventions while also emphasizing the importance of addressing ethical and deployment challenges in real-world applications. Future research directions include refining multimodal fusion techniques, improving model interpretability, and enhancing robustness in diverse linguistic and cultural contexts.
This work contributes to the growing field of AI for social good, demonstrating how efficient and intelligent speech recognition systems can play a crucial role in early intervention and harm prevention.
History
Degree Type
- Master of Science
Department
- Computer Science
Campus location
- West Lafayette