ILLUMINER: Instruction-tuned Large Language Models as Few-shot Intent Classifier and Slot Filler
Paramita Mirza, Viju Sudhi, Soumya Ranjan Sahoo, Sinchana Ramakanth Bhat
TL;DR
This work tackles the data-efficiency challenge of intent classification (IC) and slot filling (SF) in task-oriented dialogue by leveraging instruction-tuned large language models (LLMs). It introduces ILLUMINER, which reframes IC and SF as language-generation tasks and uses a single-prompt information-extraction approach for SF, combined with parameter-efficient fine-tuning (PEFT) such as LoRA on FLAN-T5-xxl. Across SNIPS, MASSIVE, and MultiWoz, ILLUMINER with LoRA outperforms state-of-the-art joint IC+SF methods and GPT-3.5 few-shot, achieving strong SF performance with less than 6% of the full training data and demonstrating robust cross-dataset and multilingual generalization. The work also provides extensive ablations on instruction-tuning versus non-instruction models, model size, PEFT techniques, and the impact of exposing label descriptions, offering practical guidance for deploying IC/SF in industry with reduced data and compute requirements.
Abstract
State-of-the-art intent classification (IC) and slot filling (SF) methods often rely on data-intensive deep learning models, limiting their practicality for industry applications. Large language models on the other hand, particularly instruction-tuned models (Instruct-LLMs), exhibit remarkable zero-shot performance across various natural language tasks. This study evaluates Instruct-LLMs on popular benchmark datasets for IC and SF, emphasizing their capacity to learn from fewer examples. We introduce ILLUMINER, an approach framing IC and SF as language generation tasks for Instruct-LLMs, with a more efficient SF-prompting method compared to prior work. A comprehensive comparison with multiple baselines shows that our approach, using the FLAN-T5 11B model, outperforms the state-of-the-art joint IC+SF method and in-context learning with GPT3.5 (175B), particularly in slot filling by 11.1--32.2 percentage points. Additionally, our in-depth ablation study demonstrates that parameter-efficient fine-tuning requires less than 6% of training data to yield comparable performance with traditional full-weight fine-tuning.
