An Attention-over-Attention Generative Model for Joint Multiple Intent Detection and Slot Filling
Wei Zhu
TL;DR
This work tackles multi-intent spoken language understanding by recasting joint intent detection and slot filling as a sequence-to-sequence problem. It introduces GEMIS, a generative framework built on a modified BART encoder-decoder that uses an attention-over-attention (AoA) decoder to jointly predict multiple intents and slot values, aided by a pointer network for position predictions. To address dataset realism, the authors construct MultiATIS and MultiSNIPS using the NSP head of BERT to select coherent utterance concatenations, resulting in samples that better reflect real-world multi-intent usage. Empirical results show state-of-the-art performance on MixATIS, MixSNIPS, and the proposed MultiATIS/MultiSNIPS, with larger gains as the number of intents grows and a clear advantage from AoA over standard cross-attention. Overall, the approach demonstrates the viability and benefits of a unified generative model for complex SLU tasks and realistic data construction for evaluation.
Abstract
In task-oriented dialogue systems, spoken language understanding (SLU) is a critical component, which consists of two sub-tasks, intent detection and slot filling. Most existing methods focus on the single-intent SLU, where each utterance only has one intent. However, in real-world scenarios users usually express multiple intents in an utterance, which poses a challenge for existing dialogue systems and datasets. In this paper, we propose a generative framework to simultaneously address multiple intent detection and slot filling. In particular, an attention-over-attention decoder is proposed to handle the variable number of intents and the interference between the two sub-tasks by incorporating an inductive bias into the process of multi-task learning. Besides, we construct two new multi-intent SLU datasets based on single-intent utterances by taking advantage of the next sentence prediction (NSP) head of the BERT model. Experimental results demonstrate that our proposed attention-over-attention generative model achieves state-of-the-art performance on two public datasets, MixATIS and MixSNIPS, and our constructed datasets.
