Integrating prosodic features in extractive meeting summarization

TLDR

Speech contains additional information beyond text that can be valuable for automatic speech summarization. The study evaluates how to effectively use acoustic/prosodic features for extractive meeting summarization and how to integrate them with lexical and structural information for further improvement. We propose normalization methods for prosodic features based on speaker, topic, or local context, and combine them with lexical and structural cues. Using only prosodic features outperforms non‑prosodic ones on both transcripts and recognition output, and a decision‑level combination with non‑prosodic features yields further gains, surpassing each individual model.

Abstract

Speech contains additional information than text that can be valuable for automatic speech summarization. In this paper, we evaluate how to effectively use acoustic/prosodic features for extractive meeting summarization, and how to integrate prosodic features with lexical and structural information for further improvement. To properly represent prosodic features, we propose different normalization methods based on speaker, topic, or local context information. Our experimental results show that using only the prosodic features we achieve better performance than using the non-prosodic information on both the human transcripts and recognition output. In addition, a decision-level combination of the prosodic and non-prosodic features yields further gain, outperforming the individual models.

References

Page 1

	Year	Citations

Page 1