End-to-End Speech Emotion Recognition Using Deep Learning
Keywords:
Deep neural networks, DNN architecture, Voice Activity Detector (VAD), Deep learningAbstract
The German Corpus database was used for testing the effectiveness of this technique. The database is used to test speech emotion recognition through deep neural networks. The technique uses convolutional pooling and fully connected layers. The database contains the audio recordings of ten actors of which five are male and other five are females, they contain seven emotional state of every actor of which only three emotional states are used. The audio recordings obtained from the database are divided into segments that are of 20 milliseconds, these segments may contain some that are empty which is identified by Voice Activity Detector(VAD) and removed. The remaining segments are divided into Training, Validation and Testing. Deep neural network is enhanced using stochastic gradient descent. After completing the experiment, the result obtained showed around 96% accuracy.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2021 Ajai Jose Jacob, Aswin A. Jacob, Ashly Mathew
This work is licensed under a Creative Commons Attribution 4.0 International License.