SQATIN: Supervised Instruction Tuning Meets Question Answering for Improved Dialogue NLU

Evgeniia Razumovskaia; Goran Glavaš; Anna Korhonen; Ivan Vulić

SQATIN: Supervised Instruction Tuning Meets Question Answering for Improved Dialogue NLU

Evgeniia Razumovskaia, Goran Glavaš, Anna Korhonen, Ivan Vulić

TL;DR

SQATIN addresses sample-efficient dialogue NLU by unifying instruction tuning with a QA-based formulation for ID and VE. Starting from instruction-tuned Flan-T5, it frames each class as a QA prompt and trains with in-domain examples, enabling robust cross-domain and cross-task transfer through natural language class descriptions. Empirical results on NLU++ and CLINC-150 show state-of-the-art performance in both in-domain and cross-domain settings, with notable gains in cross-domain VE and strong sample efficiency. The approach also supports parameter-efficient fine-tuning and outperforms in-context learning with large LLMs, suggesting practical, scalable benefits for ToD systems.

Abstract

Task-oriented dialogue (ToD) systems help users execute well-defined tasks across a variety of domains (e.g., $\textit{flight booking}$ or $\textit{food ordering}$), with their Natural Language Understanding (NLU) components being dedicated to the analysis of user utterances, predicting users' intents ($\textit{Intent Detection}$, ID) and extracting values for informational slots ($\textit{Value Extraction}$, VE). In most domains, labelled NLU data is scarce, making sample-efficient learning -- enabled with effective transfer paradigms -- paramount. In this work, we introduce SQATIN, a new framework for dialog NLU based on (i) instruction tuning and (ii) question-answering-based formulation of ID and VE tasks. According to the evaluation on established NLU benchmarks, SQATIN sets the new state of the art in dialogue NLU, substantially surpassing the performance of current models based on standard fine-tuning objectives in both in-domain training and cross-domain transfer. SQATIN yields particularly large performance gains in cross-domain transfer, owing to the fact that our QA-based instruction tuning leverages similarities between natural language descriptions of classes (i.e., slots and intents) across domains.

SQATIN: Supervised Instruction Tuning Meets Question Answering for Improved Dialogue NLU

TL;DR

Abstract

Task-oriented dialogue (ToD) systems help users execute well-defined tasks across a variety of domains (e.g.,

), with their Natural Language Understanding (NLU) components being dedicated to the analysis of user utterances, predicting users' intents (

, ID) and extracting values for informational slots (

, VE). In most domains, labelled NLU data is scarce, making sample-efficient learning -- enabled with effective transfer paradigms -- paramount. In this work, we introduce SQATIN, a new framework for dialog NLU based on (i) instruction tuning and (ii) question-answering-based formulation of ID and VE tasks. According to the evaluation on established NLU benchmarks, SQATIN sets the new state of the art in dialogue NLU, substantially surpassing the performance of current models based on standard fine-tuning objectives in both in-domain training and cross-domain transfer. SQATIN yields particularly large performance gains in cross-domain transfer, owing to the fact that our QA-based instruction tuning leverages similarities between natural language descriptions of classes (i.e., slots and intents) across domains.

Paper Structure (13 sections, 8 figures, 15 tables)

This paper contains 13 sections, 8 figures, 15 tables.

Introduction
SQATIN: Methodology
Experimental Setup
Main Evaluation
Further Analyses and Discussion
Related Work
Conclusion
Different Instruction Formulations
Full Cross-Domain Results on CLINC-150 for Different Base Models
Comparison of Single-Task and Multi-Task Models for Cross-Domain Setups
Fine-tuning and Hyperparameters
Results for Different Model Sizes
Instructions with the Multiple Choice Formulation

Figures (8)

Figure 1: Instruction examples for ID and VE: for each we show one example where the class matches the utterance (i.e., for ID: correct intent class; for VE: a value for the slot class present) and one where it does not.
Figure 2: An annotated utterance from NLU++ transformed into corresponding SQATIN instruction instances. For brevity, we display the transformation for only two intents (wifi and housekeeping), but the same transformation was applied for all intents.
Figure 3: Cross-domain transfer results for ID on CLINC-150 for SQATIN and the SotA QA-FT baseline. Full results in the tabular format are in Appendix \ref{['app: full cross-domain clinc results']}. Diagonal values correspond to in-domain results. Source domains shown along the vertical axis and target domains along the horizontal axis.
Figure 4: Comparison of ID models on banking domain on NLU++ for different training data sizes. The results are averages over 3 random seeds.
Figure 5: Full-model fine-tuning ($\approx$ 248M tunable parameters) versus PEFT with Adapters ($\approx$ 1.8M tunable parameters) in in-domain ID and VE.
...and 3 more figures

SQATIN: Supervised Instruction Tuning Meets Question Answering for Improved Dialogue NLU

TL;DR

Abstract

SQATIN: Supervised Instruction Tuning Meets Question Answering for Improved Dialogue NLU

Authors

TL;DR

Abstract

Table of Contents

Figures (8)