Recognition Of Emotion Based On Speech Samples Using CNN Modelling
##plugins.themes.bootstrap3.article.sidebar##
Download : 69 times
##plugins.themes.bootstrap3.article.main##
Akbar Ali
Vishal Sharma
Abstract
The objective of this study is to attempt to build an emotion recognition system through speech samples using deep learning techniques. Emotions are fundamental human trait, serving as a means of expressing thoughts and communicating intention. Emotion Recognition systems analyse audio signals to extract and predict the emotional state of a speaker. Emotions are generally classified as Anger, Happiness, Sadness, and Neutral. These systems rely on spectral and prosodic features to detect emotions. Mel-frequency Cepstral Coefficients (MFCC) are a significant spectral attribute, while prosodic attributes include frequency, loudness, and pitch. The frequency of an audio broadcast can be used to distinguish between various sounds and ascertain the gender of the speaker. The study shows that when Support Vector Machines (SVM) are used in Emotion Recognition to categorise and predict tasks, especially in identifying the speaker's gender. Emotions are identified utilising certain attributes through the utilisation of additional machine learning models such as Radial-Basis Function (RBF) and Back Propagation networks. The proposed model shows an accuracy of 72% reflecting reliability on CNN modelling.
##plugins.themes.bootstrap3.article.details##

This work is licensed under a Creative Commons Attribution 4.0 International License.