Table of Contents
Fetching ...

Interpretable and Robust Dialogue State Tracking via Natural Language Summarization with LLMs

Rafael Carranza, Mateo Alejandro Rojas

TL;DR

This work introduces NL-DST, a novel framework that uses large language models to generate natural language descriptions of dialogue states instead of traditional slot-value representations. By fine-tuning LLMs on datasets annotated with human-created NL state summaries, NL-DST achieves superior Joint Goal Accuracy and Slot Accuracy on MultiWOZ 2.1 and Taskmaster-1, while offering improved interpretability. Ablation studies and human evaluations confirm the advantages of free-form NL state generation, including robustness to noisy inputs and richer state representations. The approach opens avenues for integrating visual context and external knowledge, enabling more flexible and transparent task-oriented dialogue systems with potential extensions to multimodal and end-to-end architectures.

Abstract

This paper introduces a novel approach to Dialogue State Tracking (DST) that leverages Large Language Models (LLMs) to generate natural language descriptions of dialogue states, moving beyond traditional slot-value representations. Conventional DST methods struggle with open-domain dialogues and noisy inputs. Motivated by the generative capabilities of LLMs, our Natural Language DST (NL-DST) framework trains an LLM to directly synthesize human-readable state descriptions. We demonstrate through extensive experiments on MultiWOZ 2.1 and Taskmaster-1 datasets that NL-DST significantly outperforms rule-based and discriminative BERT-based DST baselines, as well as generative slot-filling GPT-2 DST models, in both Joint Goal Accuracy and Slot Accuracy. Ablation studies and human evaluations further validate the effectiveness of natural language state generation, highlighting its robustness to noise and enhanced interpretability. Our findings suggest that NL-DST offers a more flexible, accurate, and human-understandable approach to dialogue state tracking, paving the way for more robust and adaptable task-oriented dialogue systems.

Interpretable and Robust Dialogue State Tracking via Natural Language Summarization with LLMs

TL;DR

This work introduces NL-DST, a novel framework that uses large language models to generate natural language descriptions of dialogue states instead of traditional slot-value representations. By fine-tuning LLMs on datasets annotated with human-created NL state summaries, NL-DST achieves superior Joint Goal Accuracy and Slot Accuracy on MultiWOZ 2.1 and Taskmaster-1, while offering improved interpretability. Ablation studies and human evaluations confirm the advantages of free-form NL state generation, including robustness to noisy inputs and richer state representations. The approach opens avenues for integrating visual context and external knowledge, enabling more flexible and transparent task-oriented dialogue systems with potential extensions to multimodal and end-to-end architectures.

Abstract

This paper introduces a novel approach to Dialogue State Tracking (DST) that leverages Large Language Models (LLMs) to generate natural language descriptions of dialogue states, moving beyond traditional slot-value representations. Conventional DST methods struggle with open-domain dialogues and noisy inputs. Motivated by the generative capabilities of LLMs, our Natural Language DST (NL-DST) framework trains an LLM to directly synthesize human-readable state descriptions. We demonstrate through extensive experiments on MultiWOZ 2.1 and Taskmaster-1 datasets that NL-DST significantly outperforms rule-based and discriminative BERT-based DST baselines, as well as generative slot-filling GPT-2 DST models, in both Joint Goal Accuracy and Slot Accuracy. Ablation studies and human evaluations further validate the effectiveness of natural language state generation, highlighting its robustness to noise and enhanced interpretability. Our findings suggest that NL-DST offers a more flexible, accurate, and human-understandable approach to dialogue state tracking, paving the way for more robust and adaptable task-oriented dialogue systems.

Paper Structure

This paper contains 19 sections, 6 equations, 6 tables.