Publication | Open Access
“Are You Playing a Shooter Again?!” Deep Representation Learning for Audio-Based Video Game Genre Recognition
21
Citations
42
References
2019
Year
Artificial IntelligenceMusicGame AiEngineeringMachine LearningDeep SpectrumGame GenresSpeech RecognitionData ScienceAudio AnalysisVoice RecognitionGame DesignHealth SciencesShooter AgainGame AnalyticsGame StudyAudio RetrievalDeep LearningAudio MiningMusic ClassificationGame GenreSpeech ProcessingSpeech Perception
In this paper, we present a novel computer audition task: audio-based video game genre classification. The aim of this study is threefold: 1) to check the feasibility of the proposed task; 2) to introduce a new corpus: The Game Genre by Audio + Multimodal Extracts (G2 AME), collected entirely from social multimedia; and 3) to compare the efficacy of various acoustic feature spaces to classify the G2 AME corpus into six game genres using a linear support vector machine classifier. For the classification we extract three different feature representations from the game audio files: 1) Knowledge-based acoustic features; 2) DEEP SPECTRUM features; and 3) quantized DEEP SPECTRUM features using Bag-of-Audio-Words. The DEEP SPECTRUM features are a deep-learning-based representation derived from forwarding the visual representations of the audio instances, in particular spectrograms, mel-spectrograms, chromagrams, and their deltas through deep task-independent pretrained CNNs. Specifically, activations of fully connected layers from three common image classification CNNs, GoogLeNet, AlexNet, and VGG16 are used as feature vectors. Results for the six-genre classification problem indicate the suitability of our deep learning approach for this task. Our best method achieves an accuracy of up to 66.9% unweighted average recall using tenfold cross-validation.
| Year | Citations | |
|---|---|---|
Page 1
Page 1