File(s) under embargo
Reason: Publications under review.
until file(s) become available
A DEEP LEARNING BASED FRAMEWORK FOR NOVELTY AWARE EXPLAINABLE MULTIMODAL EMOTION RECOGNITION WITH SITUATIONAL KNOWLEDGE
Mental health significantly impacts issues like gun violence, school shootings, and suicide. There is a strong connection between mental health and emotional states. By monitoring emotional changes over time, we can identify triggering events, detect early signs of instability, and take preventive measures. This thesis focuses on the development of a generalized and modular system for human emotion recognition and explanation based on visual information. The aim is to address the challenges of effectively utilizing different cues (modalities) available in the data for a reliable and trustworthy emotion recognition system. Our face is one of the most important medium through which we can express our emotion. Therefore We first propose SAFER, A novel facial emotion recognition system with background and place features. We provide a detailed evaluation framework to prove the high accuracy and generalizability. However, relying solely on facial expressions for emotion recognition can be unreliable, as faces can be covered or deceptive. To enhance the system's reliability, we introduce EMERSK, a multimodal emotion recognition system that integrates various modalities, including facial expressions, posture, gait, and scene background, in a flexible and modular manner. It employs convolutional neural networks (CNNs), Long Short-term Memory (LSTM), and denoising auto-encoders to extract features from facial images, posture, gait, and scene background. In addition to multimodal feature fusion, the system utilizes situational knowledge derived from place type and adjective-noun pairs (ANP) extracted from the scene, as well as the spatio-temporal average distribution of emotions, to generate comprehensive explanations for the recognition outcomes. Extensive experiments on different benchmark datasets demonstrate the superiority of our approach over existing state-of-the-art methods. The system achieves improved performance in accurately recognizing and explaining human emotions. Moreover, we investigate the impact of novelty, such as face masks during the Covid-19 pandemic, on the emotion recognition. The study critically examines the limitations of mainstream facial expression datasets and proposes a novel dataset specifically tailored for facial emotion recognition with masked subjects. Additionally, we propose a continuous learning-based approach that incorporates a novelty detector working in parallel with the classifier to detect and properly handle instances of novelty. This approach ensures robustness and adaptability in the automatic emotion recognition task, even in the presence of novel factors such as face masks. This thesis contributes to the field of automatic emotion recognition by providing a generalized and modular approach that effectively combines multiple modalities, ensuring reliable and highly accurate recognition. Moreover, it generates situational knowledge that is valuable for mission-critical applications and provides comprehensive explanations of the output. The findings and insights from this research have the potential to enhance the understanding and utilization of multimodal emotion recognition systems in various real-world applications.
- Doctor of Philosophy
- Computer Science
- West Lafayette