FEW-SHOT TRANSFER LEARNING FOR COUGH SOUND CLASSIFICATION
Cough sound analysis has gained traction as a non-invasive tool for diagnosing respiratory illnesses such as COVID-19 and influenza. Our literature review explored a broad spectrum of approaches—including traditional machine learning, deep learning, and transfer learning—across domains such as image, audio, text, and multimodal data. While deep learning models, especially CNNs and transformers, have achieved high accuracy, they are constrained by the need for large labeled datasets. This analysis led us to identify few-shot learning (FSL) as a promising alternative for data-constrained medical diagnostics.
In this study, we implement a Prototypical Network-based FSL model for classifying cough sounds into three classes: Healthy, COVID-19, and Flu. We curate a balanced dataset from COUGHVID, Coswara, and FluSense, converting audio samples into Mel-spectrograms using Librosa. These are fed into a modified ResNet-18 backbone integrated with the EasyFSL framework for episodic N-way, K-shot training. Our evaluation spans binary and multi-class classification tasks, varying the number of support examples and using statistical tests (Two One-Sided Tests and Bootstrap-Based Equivalence Test) to assess performance differences.
Experimental results show that the multi-class model achieves 72.07% accuracy with only 15 support examples per class, while binary models exceed 74% across all class pairs. Statistical analysis reveals no significant performance drop between multi-class and binary setups, validating the use of FSL in low-resource healthcare contexts. This work demonstrates the practical viability of few-shot learning for audio-based disease screening and contributes to scalable, efficient diagnostic solutions where labeled data is limited.
History
Degree Type
- Master of Science
Department
- Computer and Information Technology
Campus location
- West Lafayette