“Are You Playing a Shooter Again?!” Deep Representation Learning for Audio-Based Video Game Genre Recognition

Abstract

In this paper, we present a novel computer audition task: audio-based video game genre classification. The aim of this study is threefold: 1) to check the feasibility of the proposed task; 2) to introduce a new corpus: The Game Genre by Audio + Multimodal Extracts (G2 AME), collected entirely from social multimedia; and 3) to compare the efficacy of various acoustic feature spaces to classify the G2 AME corpus into six game genres using a linear support vector machine classifier. For the classification we extract three different feature representations from the game audio files: 1) Knowledge-based acoustic features; 2) DEEP SPECTRUM features; and 3) quantized DEEP SPECTRUM features using Bag-of-Audio-Words. The DEEP SPECTRUM features are a deep-learning-based representation derived from forwarding the visual representations of the audio instances, in particular spectrograms, mel-spectrograms, chromagrams, and their deltas through deep task-independent pretrained CNNs. Specifically, activations of fully connected layers from three common image classification CNNs, GoogLeNet, AlexNet, and VGG16 are used as feature vectors. Results for the six-genre classification problem indicate the suitability of our deep learning approach for this task. Our best method achieves an accuracy of up to 66.9% unweighted average recall using tenfold cross-validation.

References

Page 1

	Year	Citations

Page 1