Publication | Open Access
Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires
296
Citations
85
References
2020
Year
MusicEngineeringMachine LearningAcoustic ModelingSpeech RecognitionLatent ModelingComplex VocalizationsData SciencePhoneticsVocal MusicHealth SciencesCognitive ScienceBehavioral NeuroscienceSpeech ProductionLatent StructureDeep LearningSpeech CommunicationAnimal BehaviourBioacousticsEvolutionary BiologyAnimal BehaviorSpeech ProcessingAnimal CommunicationSpeech PerceptionAnimal Vocalizations
Animal vocalizations span from simple repeated calls to complex sequences, and their characterization demands effort and intuition, yet human perception can introduce bias. The study introduces computational methods that project animal vocalizations into low‑dimensional latent spaces derived from spectrograms and analyze them as discrete sequences or continuous latent variables. The methods were applied to datasets from more than 20 species, enabling disentanglement of spectro‑temporal structure and observation of long‑timescale organization. Latent projections reveal complex, visually intuitive, and quantifiable features, facilitating powerful comparative analyses of vocal acoustics.
Animals produce vocalizations that range in complexity from a single repeated call to hundreds of unique vocal elements patterned in sequences unfolding over hours. Characterizing complex vocalizations can require considerable effort and a deep intuition about each species' vocal behavior. Even with a great deal of experience, human characterizations of animal communication can be affected by human perceptual biases. We present a set of computational methods for projecting animal vocalizations into low dimensional latent representational spaces that are directly learned from the spectrograms of vocal signals. We apply these methods to diverse datasets from over 20 species, including humans, bats, songbirds, mice, cetaceans, and nonhuman primates. Latent projections uncover complex features of data in visually intuitive and quantifiable ways, enabling high-powered comparative analyses of vocal acoustics. We introduce methods for analyzing vocalizations as both discrete sequences and as continuous latent variables. Each method can be used to disentangle complex spectro-temporal structure and observe long-timescale organization in communication.
| Year | Citations | |
|---|---|---|
Page 1
Page 1