Table of Contents
Fetching ...

ILLUMINER: Instruction-tuned Large Language Models as Few-shot Intent Classifier and Slot Filler

Paramita Mirza, Viju Sudhi, Soumya Ranjan Sahoo, Sinchana Ramakanth Bhat

TL;DR

This work tackles the data-efficiency challenge of intent classification (IC) and slot filling (SF) in task-oriented dialogue by leveraging instruction-tuned large language models (LLMs). It introduces ILLUMINER, which reframes IC and SF as language-generation tasks and uses a single-prompt information-extraction approach for SF, combined with parameter-efficient fine-tuning (PEFT) such as LoRA on FLAN-T5-xxl. Across SNIPS, MASSIVE, and MultiWoz, ILLUMINER with LoRA outperforms state-of-the-art joint IC+SF methods and GPT-3.5 few-shot, achieving strong SF performance with less than 6% of the full training data and demonstrating robust cross-dataset and multilingual generalization. The work also provides extensive ablations on instruction-tuning versus non-instruction models, model size, PEFT techniques, and the impact of exposing label descriptions, offering practical guidance for deploying IC/SF in industry with reduced data and compute requirements.

Abstract

State-of-the-art intent classification (IC) and slot filling (SF) methods often rely on data-intensive deep learning models, limiting their practicality for industry applications. Large language models on the other hand, particularly instruction-tuned models (Instruct-LLMs), exhibit remarkable zero-shot performance across various natural language tasks. This study evaluates Instruct-LLMs on popular benchmark datasets for IC and SF, emphasizing their capacity to learn from fewer examples. We introduce ILLUMINER, an approach framing IC and SF as language generation tasks for Instruct-LLMs, with a more efficient SF-prompting method compared to prior work. A comprehensive comparison with multiple baselines shows that our approach, using the FLAN-T5 11B model, outperforms the state-of-the-art joint IC+SF method and in-context learning with GPT3.5 (175B), particularly in slot filling by 11.1--32.2 percentage points. Additionally, our in-depth ablation study demonstrates that parameter-efficient fine-tuning requires less than 6% of training data to yield comparable performance with traditional full-weight fine-tuning.

ILLUMINER: Instruction-tuned Large Language Models as Few-shot Intent Classifier and Slot Filler

TL;DR

This work tackles the data-efficiency challenge of intent classification (IC) and slot filling (SF) in task-oriented dialogue by leveraging instruction-tuned large language models (LLMs). It introduces ILLUMINER, which reframes IC and SF as language-generation tasks and uses a single-prompt information-extraction approach for SF, combined with parameter-efficient fine-tuning (PEFT) such as LoRA on FLAN-T5-xxl. Across SNIPS, MASSIVE, and MultiWoz, ILLUMINER with LoRA outperforms state-of-the-art joint IC+SF methods and GPT-3.5 few-shot, achieving strong SF performance with less than 6% of the full training data and demonstrating robust cross-dataset and multilingual generalization. The work also provides extensive ablations on instruction-tuning versus non-instruction models, model size, PEFT techniques, and the impact of exposing label descriptions, offering practical guidance for deploying IC/SF in industry with reduced data and compute requirements.

Abstract

State-of-the-art intent classification (IC) and slot filling (SF) methods often rely on data-intensive deep learning models, limiting their practicality for industry applications. Large language models on the other hand, particularly instruction-tuned models (Instruct-LLMs), exhibit remarkable zero-shot performance across various natural language tasks. This study evaluates Instruct-LLMs on popular benchmark datasets for IC and SF, emphasizing their capacity to learn from fewer examples. We introduce ILLUMINER, an approach framing IC and SF as language generation tasks for Instruct-LLMs, with a more efficient SF-prompting method compared to prior work. A comprehensive comparison with multiple baselines shows that our approach, using the FLAN-T5 11B model, outperforms the state-of-the-art joint IC+SF method and in-context learning with GPT3.5 (175B), particularly in slot filling by 11.1--32.2 percentage points. Additionally, our in-depth ablation study demonstrates that parameter-efficient fine-tuning requires less than 6% of training data to yield comparable performance with traditional full-weight fine-tuning.
Paper Structure (39 sections, 7 figures, 8 tables)

This paper contains 39 sections, 7 figures, 8 tables.

Figures (7)

  • Figure 1: An example of our prompting methods for intent classification and slot filling, for a given user utterance "Find me a restaurant serving Italian food in Torino". Compared to prior work (Fig. \ref{['fig:SF-inverse-prompting']}), we only need a single inference for slot filling.
  • Figure 2: Multi-prompt IE for slot filling hou_inverse_2022 requiring $|S|$ inferences for $|S|$ slot types.
  • Figure 3: Instruct-LLMs vs their corresponding base models (non-instruct).
  • Figure 4: Performance of $\hbox{FLAN-T5}_{\scriptsize \hbox{LoRA}}$ with various FLAN-T5 size. Solid-colored bars indicate adapters' training time for IC and striped bars for SF.
  • Figure 5: Performance of $\hbox{FLAN-T5-xxl}_{\scriptsize \hbox{LoRA}}$ with various number of examples ($k$) per label. Solid-colored bars indicate % of training data for IC and striped bars for SF.
  • ...and 2 more figures