Environmental Sound Recognition With Time–Frequency Audio Features

TLDR

Environmental sound recognition aims to infer scene context from audio, yet most work relies on spectral features like MFCCs while temporal‑domain signatures of noise‑like sounds such as insect chirps and rain remain underexplored. The study empirically evaluates audio features and introduces the matching pursuit algorithm to extract time‑frequency descriptors for environmental sound classification. Matching pursuit selects atoms from a dictionary to form a flexible, physically interpretable feature set that complements MFCCs and improves classification accuracy, as shown by extensive experiments and listening tests. The resulting system achieves recognition performance comparable to human listeners.

Abstract

The paper considers the task of recognizing environmental sounds for the understanding of a scene or context surrounding an audio sensor. A variety of features have been proposed for audio recognition, including the popular Mel-frequency cepstral coefficients (MFCCs) which describe the audio spectral shape. Environmental sounds, such as chirpings of insects and sounds of rain which are typically noise-like with a broad flat spectrum, may include strong temporal domain signatures. However, only few temporal-domain features have been developed to characterize such diverse audio signals previously. Here, we perform an empirical feature analysis for audio environment characterization and propose to use the matching pursuit (MP) algorithm to obtain effective time-frequency features. The MP-based method utilizes a dictionary of atoms for feature selection, resulting in a flexible, intuitive and physically interpretable set of features. The MP-based feature is adopted to supplement the MFCC features to yield higher recognition accuracy for environmental sounds. Extensive experiments are conducted to demonstrate the effectiveness of these joint features for unstructured environmental sound classification, including listening tests to study human recognition capabilities. Our recognition system has shown to produce comparable performance as human listeners.

References

Page 1

	Year	Citations

Page 1