A Hybrid Machine Learning Model for Emotion Recognition From Speech Signals

Abstract

The emotion recognition is the addressed communication that has been gaining more interest through the public. Speech emotion recognition has become a region of interest in the field of Human-Computer Interaction(HCI). Speech recognition (SR) is the technology that is connected with the methods and fields to identify the speech from the speech signals. Different technological; developments in the area of the SPMs (Signal Processing Methods), the recognition of the expression has become probable. Though there has been a massive growth in the field of voice recognition, there are various voice yields that have been implemented like Amazon Alex, Google, Home, and Apple Homepage that purposes basically on voice-based commands. SER (Speech emotion recognition) is a research area issue that tried to gather sentiments from the speech signals. Different surveys stated that the development in sentiment detection made a lot of the networks simpler and the world an appropriate location for living. Emotion recognition is a challenging issue so that the emotion may vary that depends on the situation, culture, person face-response that leads to ambiguous results; speech quantity is not adequate to precisely infer the emotion; lack of speech database in many languages. Moreover, SER has been used in various applications like interaction with robots, bank services, digital games, and so forth. In existing research, different speech emotions like Happy, Anger, and Sad, and were detected or recognized through the feature vectors. The various feature sets used were removed from the acoustic signals named such as Voice Pitch, MFCC (Mel Frequency cepstral coefficients), and STM (Short Term Energy). The various techniques have developed on the feature sets, and the influence of the increasing amount of the feature sets provide for the classifier. It presents the observation of the performance classification for India, Hindi, and Marathi speech. Moreover, the accuracy of the music or normal vocal-speech was 80%. In research work has designed an SER (Speech Emotion Recognition) model depends on the GFCC algorithm to citation the feature sets based on the DCT and High pass Filter method. After that, the ALO algorithm is using to select the instances with the help of coverage and Fitness function. The novel MSVM algorithm is using to classify the emotion-based on the feature set and evaluate the performance metric such as accuracy rate etc. In proposed work using the MATLAB simulation tool and evaluates the maximum accuracy rate and mitigate the error rates as compared with the existing parameters.

References

Page 1

	Year	Citations

Page 1