SPARSE DISCRETE WAVELET DECOMPOSITION AND FILTER BANK  TECHNIQUES FOR SPEECH RECOGNITION

Dai, Jingzhao

doi:10.25394/PGS.8050565.v1

JingzhaoDai_Thesis.pdf (920.24 kB)

SPARSE DISCRETE WAVELET DECOMPOSITION AND FILTER BANK TECHNIQUES FOR SPEECH RECOGNITION

thesis

posted on 2019-06-11, 17:39 authored by Jingzhao DaiJingzhao Dai

Speech recognition is widely applied to translation from speech to related text, voice driven commands, human machine interface and so on [1]-[8]. It has been increasingly proliferated to Human’s lives in the modern age. To improve the accuracy of speech recognition, various algorithms such as artificial neural network, hidden Markov model and so on have been developed [1], [2].

In this thesis work, the tasks of speech recognition with various classifiers are investigated. The classifiers employed include the support vector machine (SVM), k-nearest neighbors (KNN), random forest (RF) and convolutional neural network (CNN). Two novel features extraction methods of sparse discrete wavelet decomposition (SDWD) and bandpass filtering (BPF) based on the Mel filter banks [9] are developed and proposed. In order to meet diversity of classification algorithms, one-dimensional (1D) and two-dimensional (2D) features are required to be obtained. The 1D features are the array of power coefficients in frequency bands, which are dedicated for training SVM, KNN and RF classifiers while the 2D features are formed both in frequency domain and temporal variations. In fact, the 2D feature consists of the power values in decomposed bands versus consecutive speech frames. Most importantly, the 2D feature with geometric transformation are adopted to train CNN.

Speech recognition including males and females are from the recorded data set as well as the standard data set. Firstly, the recordings with little noise and clear pronunciation are applied with the proposed feature extraction methods. After many trials and experiments using this dataset, a high recognition accuracy is achieved. Then, these feature extraction methods are further applied to the standard recordings having random characteristics with ambient noise and unclear pronunciation. Many experiment results validate the effectiveness of the proposed feature extraction techniques.

History

Degree Type

Master of Science in Engineering

Department

Electrical and Computer Engineering

Campus location

Hammond

Advisor/Supervisor/Committee Chair

Li-Zhe Tan

Additional Committee Member 2

Yao Xu

Additional Committee Member 3

Bin Chen

Usage metrics

Keywords

Mel Frequency Cepstral Coefficients (MFCC)Sparse discrete wavelet decomposition Bandpass filter banks Support vector machine Support vector classification Random forest K nearest neighbors Convolutional neural networks Electrical and Electronic Engineering not elsewhere classified

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC