Augmenting dysphonia voice using Fourier-Based synchrosqueezing transform (FSST) for a CNN classifier
Alice Rueda
Toronto, Ontario
- 0 Collaborators
The challenge of dysphonia voice studies is always the small dataset. It is difficult to apply more sophisticated deep learning techniques without overfitting or underfitting. Convolutional neural network (CNN) is a powerful classifier that requires a large amount of training data. Data augmentation techniques for voice are limited. Fourier-based synchrosqueezing transform (FSST) can be used as a data aug- mentation technique to increase the data size. The results indicated that not only can FSST increase the data size, the CNN can also learn better with FSST than with Short-Time Fourier Transform (STFT) power spectrum. The loss function for FSST converges, but not for STFT. FSST is also more stable and provides more accurate results. ...learn more
Project status: Published/In Market
Intel Technologies
AI DevCloud / Xeon
Overview / Usage
This is to show case that there are possible augmentation techniques to increase the dataset that is suitable to train a DNN.
Methodology / Approach
Instead of using the typical spectrogram to feed into CNN, a sharper and more sparse time-frequency representation was used. The results showed that for a limited pathological dataset, there is not enough samples for even a sample CNN to learn. However, the sharper and more sparse representation can train the CNN even with sample size as small as 100 per class.
Technologies Used
Tensorflow, pipelining data through TFRecords, DevCloud