Efficient slot labelling
Vladimir Vlasov
TL;DR
This work tackles slot labeling in task-oriented dialogue by addressing the heavy computational burden and pretraining dependence of large PLMs. It introduces a compact architecture that uses trainable word-query attention with relative attention, a gating mechanism, and a CRF tagger, augmented by a block-diagonal dense layer to minimize parameters (approximately $10^6$). Across RESTAURANTS-8K, MTOP, and ATIS, the model matches or surpasses state-of-the-art baselines while dramatically reducing parameter count, especially in low-resource settings. The findings underscore the practicality of deploying efficient, context-aware slot-labeling models on-device, with demonstrated multilingual effectiveness and robust ablation-backed design choices.
Abstract
Slot labelling is an essential component of any dialogue system, aiming to find important arguments in every user turn. Common approaches involve large pre-trained language models (PLMs) like BERT or RoBERTa, but they face challenges such as high computational requirements and dependence on pre-training data. In this work, we propose a lightweight method which performs on par or better than the state-of-the-art PLM-based methods, while having almost 10x less trainable parameters. This makes it especially applicable for real-life industry scenarios.
