A Preliminary Evaluation of ChatGPT for Zero-shot Dialogue Understanding

Wenbo Pan; Qiguang Chen; Xiao Xu; Wanxiang Che; Libo Qin

A Preliminary Evaluation of ChatGPT for Zero-shot Dialogue Understanding

Wenbo Pan, Qiguang Chen, Xiao Xu, Wanxiang Che, Libo Qin

TL;DR

The paper evaluates ChatGPT on zero-shot dialogue understanding for SLU and DST across ATIS, SNIPS, and MultiWOZ benchmarks, introducing a multi-turn interactive prompting framework to enhance DST. It shows that ChatGPT can perform zero-shot DST and SLU to a meaningful degree, with DST benefiting most from context-aware prompts, while slot-filling in SLU remains challenging and prone to formatting issues. The analysis identifies key factors in prompt design and additional information that boost performance, as well as unexpected model behaviors and prompt-length limitations. These findings offer practical guidance for building zero-shot dialogue understanding systems with large language models and point to directions for future research.

Abstract

Zero-shot dialogue understanding aims to enable dialogue to track the user's needs without any training data, which has gained increasing attention. In this work, we investigate the understanding ability of ChatGPT for zero-shot dialogue understanding tasks including spoken language understanding (SLU) and dialogue state tracking (DST). Experimental results on four popular benchmarks reveal the great potential of ChatGPT for zero-shot dialogue understanding. In addition, extensive analysis shows that ChatGPT benefits from the multi-turn interactive prompt in the DST task but struggles to perform slot filling for SLU. Finally, we summarize several unexpected behaviors of ChatGPT in dialogue understanding tasks, hoping to provide some insights for future research on building zero-shot dialogue understanding systems with Large Language Models (LLMs).

A Preliminary Evaluation of ChatGPT for Zero-shot Dialogue Understanding

TL;DR

Abstract

A Preliminary Evaluation of ChatGPT for Zero-shot Dialogue Understanding

Authors

TL;DR

Abstract

Table of Contents

Figures (1)