The AT&T spoken language understanding system

TLDR

Spoken language understanding extracts meaning from natural language speech and has evolved over the past decade into goal‑oriented dialog systems that range from fixed‑grammar phrase recognition to intent and named‑entity extraction across limited domains. The authors present VoiceTone/spl reg/, an AT&T SLU system designed to use data‑driven machine learning for scalable, quickly deployable spoken dialog applications with minimal human effort. The system extracts intents via statistical classifiers trained on labeled data and named entities via rule‑based grammars, using a predicate‑argument representation, integrating hand‑crafted and learned rules, and employing an active learning framework to reduce labeling effort. Evaluation on two deployed VoiceTone/spl reg/ applications demonstrates the system’s practical performance.

Abstract

Spoken language understanding (SLU) aims at extracting meaning from natural language speech. Over the past decade, a variety of practical goal-oriented spoken dialog systems have been built for limited domains. SLU in these systems ranges from understanding predetermined phrases through fixed grammars, extracting some predefined named entities, extracting users' intents for call classification, to combinations of users' intents and named entities. In this paper, we present the SLU system of VoiceTone/spl reg/ (a service provided by AT&T where AT&T develops, deploys and hosts spoken dialog applications for enterprise customers). The SLU system includes extracting both intents and the named entities from the users' utterances. For intent determination, we use statistical classifiers trained from labeled data, and for named entity extraction we use rule-based fixed grammars. The focus of our work is to exploit data and to use machine learning techniques to create scalable SLU systems which can be quickly deployed for new domains with minimal human intervention. These objectives are achieved by 1) using the predicate-argument representation of semantic content of an utterance; 2) extending statistical classifiers to seamlessly integrate hand crafted classification rules with the rules learned from data; and 3) developing an active learning framework to minimize the human labeling effort for quickly building the classifier models and adapting them to changes. We present an evaluation of this system using two deployed applications of VoiceTone/spl reg/.

References

Page 1

	Year	Citations

Page 1