Table of Contents
Fetching ...

An Attention-over-Attention Generative Model for Joint Multiple Intent Detection and Slot Filling

Wei Zhu

TL;DR

This work tackles multi-intent spoken language understanding by recasting joint intent detection and slot filling as a sequence-to-sequence problem. It introduces GEMIS, a generative framework built on a modified BART encoder-decoder that uses an attention-over-attention (AoA) decoder to jointly predict multiple intents and slot values, aided by a pointer network for position predictions. To address dataset realism, the authors construct MultiATIS and MultiSNIPS using the NSP head of BERT to select coherent utterance concatenations, resulting in samples that better reflect real-world multi-intent usage. Empirical results show state-of-the-art performance on MixATIS, MixSNIPS, and the proposed MultiATIS/MultiSNIPS, with larger gains as the number of intents grows and a clear advantage from AoA over standard cross-attention. Overall, the approach demonstrates the viability and benefits of a unified generative model for complex SLU tasks and realistic data construction for evaluation.

Abstract

In task-oriented dialogue systems, spoken language understanding (SLU) is a critical component, which consists of two sub-tasks, intent detection and slot filling. Most existing methods focus on the single-intent SLU, where each utterance only has one intent. However, in real-world scenarios users usually express multiple intents in an utterance, which poses a challenge for existing dialogue systems and datasets. In this paper, we propose a generative framework to simultaneously address multiple intent detection and slot filling. In particular, an attention-over-attention decoder is proposed to handle the variable number of intents and the interference between the two sub-tasks by incorporating an inductive bias into the process of multi-task learning. Besides, we construct two new multi-intent SLU datasets based on single-intent utterances by taking advantage of the next sentence prediction (NSP) head of the BERT model. Experimental results demonstrate that our proposed attention-over-attention generative model achieves state-of-the-art performance on two public datasets, MixATIS and MixSNIPS, and our constructed datasets.

An Attention-over-Attention Generative Model for Joint Multiple Intent Detection and Slot Filling

TL;DR

This work tackles multi-intent spoken language understanding by recasting joint intent detection and slot filling as a sequence-to-sequence problem. It introduces GEMIS, a generative framework built on a modified BART encoder-decoder that uses an attention-over-attention (AoA) decoder to jointly predict multiple intents and slot values, aided by a pointer network for position predictions. To address dataset realism, the authors construct MultiATIS and MultiSNIPS using the NSP head of BERT to select coherent utterance concatenations, resulting in samples that better reflect real-world multi-intent usage. Empirical results show state-of-the-art performance on MixATIS, MixSNIPS, and the proposed MultiATIS/MultiSNIPS, with larger gains as the number of intents grows and a clear advantage from AoA over standard cross-attention. Overall, the approach demonstrates the viability and benefits of a unified generative model for complex SLU tasks and realistic data construction for evaluation.

Abstract

In task-oriented dialogue systems, spoken language understanding (SLU) is a critical component, which consists of two sub-tasks, intent detection and slot filling. Most existing methods focus on the single-intent SLU, where each utterance only has one intent. However, in real-world scenarios users usually express multiple intents in an utterance, which poses a challenge for existing dialogue systems and datasets. In this paper, we propose a generative framework to simultaneously address multiple intent detection and slot filling. In particular, an attention-over-attention decoder is proposed to handle the variable number of intents and the interference between the two sub-tasks by incorporating an inductive bias into the process of multi-task learning. Besides, we construct two new multi-intent SLU datasets based on single-intent utterances by taking advantage of the next sentence prediction (NSP) head of the BERT model. Experimental results demonstrate that our proposed attention-over-attention generative model achieves state-of-the-art performance on two public datasets, MixATIS and MixSNIPS, and our constructed datasets.
Paper Structure (21 sections, 9 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 21 sections, 9 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: An example to illustrate the task, joint multiple intent detection and slot filling. We show the slot annotation in the BIO format, where BIO means Begin/Inside/Outside.
  • Figure 2: Comparison between previous state-of-the-art architectures (a) and our proposed generative architecture (b). In contrast to previous architectures where an interaction module is needed and trained from scratch, our architecture reformulates the two sub-tasks as a unified sequence-to-sequence task and uses a pre-trained shared decoder to capture the relationship between the two sub-tasks.
  • Figure 3: Illustration of the overall architecture and the attention-over-attention (AoA) structure.
  • Figure 4: Intent co-occurrence of MixSNIPS and MultiSNIPS. (a) In MixSNIPS DBLP:conf/emnlp/QinXCL20, the distribution of intent co-occurrence for each intent is a uniform distribution because MixSNIPS randomly concatenates single-intent utterances regardless of the relationship between the intents. (b) In our constructed MultiSNIPS, related intents, e.g., AddToPlaylist and PlayMusic, BookRestaurant and GetWeather, frequently appear in the same utterance, which is more realistic.
  • Figure 5: Overall accuracy with different number of intents. Our proposed GEMIS significantly outperforms previous state-of-the-art methods as the number of intents increases.