Speech Assistance using Artificial Neural Netowrk

Speech Assistance using Artificial Neural Netowrk

Ramy Mounir

Ramy Mounir

Tampa, Florida

This project is intended to help people with speech impediments to use speech-to-text applications with better accuracy.

Artificial Intelligence, Robotics

  • 0 Collaborators

  • 0 Followers

    Follow

Description

As the name suggests, we will be using a Deep Bidirectional Recurrent Neural Network with LSTMs (DBRNN) to achieve the state-of-the-art performance described by Graves et al. using a normal speech dataset (no speech impediment). This model will include Mel Frequency Cepstral Coefficients (MFCC) for filtering and feature extraction. We will also use Connectionist Temporal Classification (CTC) for data aligning and labeling unsegmented sequences; CTC is used as the cost function.A Word to ARPAbet phonemes dictionary from CMU is used here as well.

Output phonemes are then post processed by altering the phonemes sequence to generate potential words. Those word are then fed to another Recurrent neural network that acts as a language model assigning probabilities for words to occur given previous word(s). Beam search is to be used for efficient scanning of possible sentences.

This project is likely to have a speaker dependent version to increase the accuracy of Automatic Speech Recognition. TensorFlow is the framework to be used in this project. A GUI will be designed using QT designer for easy use and demonstrations.

Ramy M. added photos to project Speech Assistance using Artificial Neural Netowrk

Medium 1a76f83c fd27 404f 8fa6 f286afe2e215

Speech Assistance using Artificial Neural Netowrk

As the name suggests, we will be using a Deep Bidirectional Recurrent Neural Network with LSTMs (DBRNN) to achieve the state-of-the-art performance described by Graves et al. using a normal speech dataset (no speech impediment). This model will include Mel Frequency Cepstral Coefficients (MFCC) for filtering and feature extraction. We will also use Connectionist Temporal Classification (CTC) for data aligning and labeling unsegmented sequences; CTC is used as the cost function.A Word to ARPAbet phonemes dictionary from CMU is used here as well.

Output phonemes are then post processed by altering the phonemes sequence to generate potential words. Those word are then fed to another Recurrent neural network that acts as a language model assigning probabilities for words to occur given previous word(s). Beam search is to be used for efficient scanning of possible sentences.

This project is likely to have a speaker dependent version to increase the accuracy of Automatic Speech Recognition. TensorFlow is the framework to be used in this project. A GUI will be designed using QT designer for easy use and demonstrations.

Default user avatar 57012e2942

Ramy M. created project Speech Assistance using Artificial Neural Netowrk

Medium 1a76f83c fd27 404f 8fa6 f286afe2e215

Speech Assistance using Artificial Neural Netowrk

As the name suggests, we will be using a Deep Bidirectional Recurrent Neural Network with LSTMs (DBRNN) to achieve the state-of-the-art performance described by Graves et al. using a normal speech dataset (no speech impediment). This model will include Mel Frequency Cepstral Coefficients (MFCC) for filtering and feature extraction. We will also use Connectionist Temporal Classification (CTC) for data aligning and labeling unsegmented sequences; CTC is used as the cost function.A Word to ARPAbet phonemes dictionary from CMU is used here as well.

Output phonemes are then post processed by altering the phonemes sequence to generate potential words. Those word are then fed to another Recurrent neural network that acts as a language model assigning probabilities for words to occur given previous word(s). Beam search is to be used for efficient scanning of possible sentences.

This project is likely to have a speaker dependent version to increase the accuracy of Automatic Speech Recognition. TensorFlow is the framework to be used in this project. A GUI will be designed using QT designer for easy use and demonstrations.

No users to show at the moment.

No users to show at the moment.