Low-Resource Automatic Speech Recognition Domain Adaptation: A Case-Study in Aviation Maintenance
With timeliness and efficiency being critical in the aviation maintenance industry, the need has been growing for smart technological solutions that help in optimizing and streamlining the different underlying tasks. One such task is the technical documentation of the performed maintenance operations. Instead of paper-based documentation, voice tools that transcribe spoken logbook entries allow technicians to document their work right away in a hands-free and time efficient manner. However, an accurate automatic speech recognition (ASR) model requires large training corpora, which are lacking in the domain of aviation maintenance. In addition, ASR models which are trained on huge corpora in standard English perform poorly in such a technical domain with non-standard terminology. Hence, this thesis investigates the extent to which fine-tuning an ASR model, pre-trained on standard English corpora, on limited in-domain data improves its recognition performance in the technical domain of aviation maintenance. The thesis presents a case study on one such pre-trained ASR model, wav2vec 2.0. Results show that fine-tuning the model on a limited anonymized dataset of maintenance logbook entries brings about a significant reduction in its error rates when tested on not only an anonymized in-domain dataset, but also a non-anonymized one. This suggests that any available aviation maintenance logbooks, even if anonymized for privacy, can be used to fine-tune general-purpose ASR models and enhance their in-domain performance. Lastly, an analysis on the influence of voice characteristics on model performance stresses the need for balanced datasets representative of the population of aviation maintenance technicians.
History
Degree Type
- Master of Science
Department
- Computer and Information Technology
Campus location
- West Lafayette