Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM

TLDR

Sequence‑to‑sequence deep learning has emerged as a new paradigm for spoken language understanding, yet prior work has largely focused on single‑domain models for tasks such as slot filling or domain classification, comparing deep learning to conventional methods like conditional random fields. This paper proposes a holistic multi‑domain, multi‑task framework that jointly estimates complete semantic frames for all user utterances addressed to a conversational system, leveraging bi‑directional RNN‑LSTM to handle the complexity. The authors implement a bi‑directional RNN‑LSTM architecture that jointly models slot filling, intent determination, and domain classification across multiple domains, building a joint multi‑domain model that allows data from each domain to reinforce one another and exploring alternative lexical‑context architectures. Experimental results on Microsoft Cortana real‑user data demonstrate that this joint multi‑domain RNN‑LSTM approach outperforms alternative single‑domain/task deep‑learning methods.

Abstract

Sequence-to-sequence deep learning has recently emerged as a new paradigm in supervised learning for spoken language understanding. However, most of the previous studies explored this framework for building single domain models for each task, such as slot filling or domain classification, comparing deep learning based approaches with conventional ones like conditional random fields. This paper proposes a holistic multi-domain, multi-task (i.e. slot filling, domain and intent detection) modeling approach to estimate complete semantic frames for all user utterances addressed to a conversational system, demonstrating the distinctive power of deep learning methods, namely bi-directional recurrent neural network (RNN) with long-short term memory (LSTM) cells (RNN-LSTM) to handle such complexity. The contributions of the presented work are three-fold: (i) we propose an RNN-LSTM architecture for joint modeling of slot filling, intent determination, and domain classification; (ii) we build a joint multi-domain model enabling multi-task deep learning where the data from each domain reinforces each other; (iii) we investigate alternative architectures for modeling lexical context in spoken language understanding. In addition to the simplicity of the single model framework, experimental results show the power of such an approach on Microsoft Cortana real user data over alternative methods based on single domain/task deep learning.

References

Page 1

	Year	Citations

Page 1