Table of Contents
Fetching ...

A Zero-Shot Open-Vocabulary Pipeline for Dialogue Understanding

Abdulfattah Safa, Gözde Gül Şahin

TL;DR

This paper tackles dialogue state tracking without relying on predefined ontologies by introducing a zero-shot, open-vocabulary pipeline that first classifies domains per turn and then performs DST either as QA or via Self-Refined Prompts (SRP). DST-as-QA targets smaller models with concise multiple-choice questions, while DST-as-SRP provides a structured, single-prompt approach that tracks all slots per turn. Across SGD and MultiWOZ datasets, the SRP method achieves state-of-the-art results in zero-shot settings, with significant reductions in LLM query counts (up to 90% fewer) and notable JGA gains (up to 20%) compared to prior work. The approach demonstrates practical viability by avoiding fixed ontology values, reducing compute, and maintaining high accuracy in diverse domains, making it suitable for dynamic, real-world dialogue systems.

Abstract

Dialogue State Tracking (DST) is crucial for understanding user needs and executing appropriate system actions in task-oriented dialogues. Majority of existing DST methods are designed to work within predefined ontologies and assume the availability of gold domain labels, struggling with adapting to new slots values. While Large Language Models (LLMs)-based systems show promising zero-shot DST performance, they either require extensive computational resources or they underperform existing fully-trained systems, limiting their practicality. To address these limitations, we propose a zero-shot, open-vocabulary system that integrates domain classification and DST in a single pipeline. Our approach includes reformulating DST as a question-answering task for less capable models and employing self-refining prompts for more adaptable ones. Our system does not rely on fixed slot values defined in the ontology allowing the system to adapt dynamically. We compare our approach with existing SOTA, and show that it provides up to 20% better Joint Goal Accuracy (JGA) over previous methods on datasets like Multi-WOZ 2.1, with up to 90% fewer requests to the LLM API.

A Zero-Shot Open-Vocabulary Pipeline for Dialogue Understanding

TL;DR

This paper tackles dialogue state tracking without relying on predefined ontologies by introducing a zero-shot, open-vocabulary pipeline that first classifies domains per turn and then performs DST either as QA or via Self-Refined Prompts (SRP). DST-as-QA targets smaller models with concise multiple-choice questions, while DST-as-SRP provides a structured, single-prompt approach that tracks all slots per turn. Across SGD and MultiWOZ datasets, the SRP method achieves state-of-the-art results in zero-shot settings, with significant reductions in LLM query counts (up to 90% fewer) and notable JGA gains (up to 20%) compared to prior work. The approach demonstrates practical viability by avoiding fixed ontology values, reducing compute, and maintaining high accuracy in diverse domains, making it suitable for dynamic, real-world dialogue systems.

Abstract

Dialogue State Tracking (DST) is crucial for understanding user needs and executing appropriate system actions in task-oriented dialogues. Majority of existing DST methods are designed to work within predefined ontologies and assume the availability of gold domain labels, struggling with adapting to new slots values. While Large Language Models (LLMs)-based systems show promising zero-shot DST performance, they either require extensive computational resources or they underperform existing fully-trained systems, limiting their practicality. To address these limitations, we propose a zero-shot, open-vocabulary system that integrates domain classification and DST in a single pipeline. Our approach includes reformulating DST as a question-answering task for less capable models and employing self-refining prompts for more adaptable ones. Our system does not rely on fixed slot values defined in the ontology allowing the system to adapt dynamically. We compare our approach with existing SOTA, and show that it provides up to 20% better Joint Goal Accuracy (JGA) over previous methods on datasets like Multi-WOZ 2.1, with up to 90% fewer requests to the LLM API.
Paper Structure (33 sections, 7 figures, 6 tables)

This paper contains 33 sections, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Overview of the architecture, comprising two stages: 1. Domain Classification and 2. Dialogue State Tracking (DST). DST can be performed via either 2.a: DST-as-SRP or 2.b: DST-as-QA. The color scheme is as follows: prompts have a cyan background, schema has a blue background, results are in blue, and output stages have a yellow background. (The prompts in this figure are illustrative. For actual prompts, please refer to App. \ref{['app:prompt-template']})
  • Figure 2: JGA Comparison Across Domains in MultiWOZ 2.4 Dataset for GPT-4-Turbo and Llama 3 Models
  • Figure 3: Turn Domain Classification Accuracy for MultiWOZ 2.4
  • Figure 4: Average Number of Domains and Domain Change Per Turn for MultiWOZ 2.4
  • Figure 5: Ground-Truth Domains vs Incorrectly Predicted Ones in SGD
  • ...and 2 more figures