End-to-End Speech Emotion Recognition Using Deep Learning

Authors

  • Ajai Jose Jacob Department of Computer Applications, Saintgits College of Applied Sciences, Kottayam, India
  • Aswin A. Jacob Department of Computer Applications, Saintgits College of Applied Sciences, Kottayam, India
  • Ashly Mathew Department of Computer Applications, Saintgits College of Applied Sciences, Kottayam, India

Keywords:

Deep neural networks, DNN architecture, Voice Activity Detector (VAD), Deep learning

Abstract

The German Corpus database was used for testing the effectiveness of this technique. The database is used to test speech emotion recognition through deep neural networks. The technique uses convolutional pooling and fully connected layers. The database contains the audio recordings of ten actors of which five are male and other five are females, they contain seven emotional state of every actor of which only three emotional states are used. The audio recordings obtained from the database are divided into segments that are of 20 milliseconds, these segments may contain some that are empty which is identified by Voice Activity Detector(VAD) and removed. The remaining segments are divided into Training, Validation and Testing. Deep neural network is enhanced using stochastic gradient descent. After completing the experiment, the result obtained showed around 96% accuracy.

Downloads

Download data is not yet available.

Downloads

Published

02-04-2021

Issue

Section

Articles

How to Cite

[1]
A. J. Jacob, A. A. Jacob, and A. Mathew, “End-to-End Speech Emotion Recognition Using Deep Learning”, IJRESM, vol. 4, no. 3, pp. 134–135, Apr. 2021, Accessed: Dec. 26, 2024. [Online]. Available: https://journal.ijresm.com/index.php/ijresm/article/view/592