Speech Assistance using Artificial Neural Netowrk

Speech Assistance using Artificial Neural Netowrk

Ramy Mounir

Ramy Mounir

Tampa, Florida

This project is intended to help people with speech impediments to use speech-to-text applications with better accuracy.

Artificial Intelligence, Robotics

  • 0 Collaborators

  • 3 Followers

    Follow

Description

As the name suggests, we will be using a Deep Bidirectional Recurrent Neural Network with LSTMs (DBRNN) to achieve the state-of-the-art performance described by Graves et al. using a normal speech dataset (no speech impediment). This model will include Mel Frequency Cepstral Coefficients (MFCC) for filtering and feature extraction. We will also use Connectionist Temporal Classification (CTC) for data aligning and labeling unsegmented sequences; CTC is used as the cost function.A Word to ARPAbet phonemes dictionary from CMU is used here as well.

Output phonemes are then post processed by altering the phonemes sequence to generate potential words. Those word are then fed to another Recurrent neural network that acts as a language model assigning probabilities for words to occur given previous word(s). Beam search is to be used for efficient scanning of possible sentences.

This project is likely to have a speaker dependent version to increase the accuracy of Automatic Speech Recognition. TensorFlow is the framework to be used in this project. A GUI will be designed using QT designer for easy use and demonstrations.

Ramy M. added photos to project Speech Assistance using Artificial Neural Netowrk

Medium 1a76f83c fd27 404f 8fa6 f286afe2e215

Speech Assistance using Artificial Neural Netowrk

As the name suggests, we will be using a Deep Bidirectional Recurrent Neural Network with LSTMs (DBRNN) to achieve the state-of-the-art performance described by Graves et al. using a normal speech dataset (no speech impediment). This model will include Mel Frequency Cepstral Coefficients (MFCC) for filtering and feature extraction. We will also use Connectionist Temporal Classification (CTC) for data aligning and labeling unsegmented sequences; CTC is used as the cost function.A Word to ARPAbet phonemes dictionary from CMU is used here as well.

Output phonemes are then post processed by altering the phonemes sequence to generate potential words. Those word are then fed to another Recurrent neural network that acts as a language model assigning probabilities for words to occur given previous word(s). Beam search is to be used for efficient scanning of possible sentences.

This project is likely to have a speaker dependent version to increase the accuracy of Automatic Speech Recognition. TensorFlow is the framework to be used in this project. A GUI will be designed using QT designer for easy use and demonstrations.

Default user avatar 57012e2942

Ramy M. created project Speech Assistance using Artificial Neural Netowrk

Medium 1a76f83c fd27 404f 8fa6 f286afe2e215

Speech Assistance using Artificial Neural Netowrk

As the name suggests, we will be using a Deep Bidirectional Recurrent Neural Network with LSTMs (DBRNN) to achieve the state-of-the-art performance described by Graves et al. using a normal speech dataset (no speech impediment). This model will include Mel Frequency Cepstral Coefficients (MFCC) for filtering and feature extraction. We will also use Connectionist Temporal Classification (CTC) for data aligning and labeling unsegmented sequences; CTC is used as the cost function.A Word to ARPAbet phonemes dictionary from CMU is used here as well.

Output phonemes are then post processed by altering the phonemes sequence to generate potential words. Those word are then fed to another Recurrent neural network that acts as a language model assigning probabilities for words to occur given previous word(s). Beam search is to be used for efficient scanning of possible sentences.

This project is likely to have a speaker dependent version to increase the accuracy of Automatic Speech Recognition. TensorFlow is the framework to be used in this project. A GUI will be designed using QT designer for easy use and demonstrations.

No users to show at the moment.

Default user avatar 57012e2942
  • Projects 0
  • Followers 0

vikram rawal

Abu Road, Rajasthan 307026, India

Bigger 0 0dsoagrbuw7biplx0pkwhqmqg6iwnylpuxkekpobpj7mnthseik2g5ebde7bioh0dikwr6i oa in75cmqp56jwlwa wn7jmdqphpgbapq cn4erdg2djgfnf2
  • Projects 1
  • Followers 0

saloni gupta

I am beginner in this area of artificial intelligence,machine learning and deep learning.

India

Bigger adam
Innovator
  • Projects 3
  • Followers 4

Adam Milton-Barker

I am a self taught programmer with over 13 years experience in various aspects of the tech industry. I am involved in disruptive and non-disruptive tech from web/Facebook/mobile & desktop apps, to the IoT, A.I., BioHacking, BCI (Brain Computer Interface) apps and Robotics. My EcoSystem, the TechBubble Technologies EcoSystem, provides hyrbid web development, business administration systems, an IoT platform (JumpWay), IoT devices (IntelliLan), an AI platform, a news outlet and a robotics platform/social network are in development. In January 2016 I was a Semi Finalist in the Smart Homes category of the IBM/4YFN Global Mobile Innovators Tournament, I was also a first phaze winner in Microsoft/Arduino World Maker Challenge. More recently our team won the Intel Experts Award at the IoT Solutions World Congress for building a neural network on an Intel Joule and I am now an Intel Software Innovator.

Spain