Table of Contents
Fetching ...

Effective and Efficient Conversation Retrieval for Dialogue State Tracking with Implicit Text Summaries

Seanie Lee, Jianpeng Cheng, Joris Driesen, Alexandru Coca, Anders Johannsen

TL;DR

This work tackles few-shot dialogue state tracking (DST) by reframing retrieval around text summaries of conversations rather than raw dialogue. It introduces CONVERSE, a dual-encoder with soft history grounding that aligns conversations with their implicit summaries, enabling efficient maximum inner product search. To avoid online cost from generating summaries at test time, CONVERSE distills a lightweight encoder that embeds dialogues into a vector space similar to their summaries. Empirical results on MultiWOZ with GPT-Neo and LLaMA show substantial gains over strong baselines and better generalization to unseen domains, demonstrating practical impact for scalable, domain-robust DST with minimal annotated data.

Abstract

Few-shot dialogue state tracking (DST) with Large Language Models (LLM) relies on an effective and efficient conversation retriever to find similar in-context examples for prompt learning. Previous works use raw dialogue context as search keys and queries, and a retriever is fine-tuned with annotated dialogues to achieve superior performance. However, the approach is less suited for scaling to new domains or new annotation languages, where fine-tuning data is unavailable. To address this problem, we handle the task of conversation retrieval based on text summaries of the conversations. A LLM-based conversation summarizer is adopted for query and key generation, which enables effective maximum inner product search. To avoid the extra inference cost brought by LLM-based conversation summarization, we further distill a light-weight conversation encoder which produces query embeddings without decoding summaries for test conversations. We validate our retrieval approach on MultiWOZ datasets with GPT-Neo-2.7B and LLaMA-7B/30B. The experimental results show a significant improvement over relevant baselines in real few-shot DST settings.

Effective and Efficient Conversation Retrieval for Dialogue State Tracking with Implicit Text Summaries

TL;DR

This work tackles few-shot dialogue state tracking (DST) by reframing retrieval around text summaries of conversations rather than raw dialogue. It introduces CONVERSE, a dual-encoder with soft history grounding that aligns conversations with their implicit summaries, enabling efficient maximum inner product search. To avoid online cost from generating summaries at test time, CONVERSE distills a lightweight encoder that embeds dialogues into a vector space similar to their summaries. Empirical results on MultiWOZ with GPT-Neo and LLaMA show substantial gains over strong baselines and better generalization to unseen domains, demonstrating practical impact for scalable, domain-robust DST with minimal annotated data.

Abstract

Few-shot dialogue state tracking (DST) with Large Language Models (LLM) relies on an effective and efficient conversation retriever to find similar in-context examples for prompt learning. Previous works use raw dialogue context as search keys and queries, and a retriever is fine-tuned with annotated dialogues to achieve superior performance. However, the approach is less suited for scaling to new domains or new annotation languages, where fine-tuning data is unavailable. To address this problem, we handle the task of conversation retrieval based on text summaries of the conversations. A LLM-based conversation summarizer is adopted for query and key generation, which enables effective maximum inner product search. To avoid the extra inference cost brought by LLM-based conversation summarization, we further distill a light-weight conversation encoder which produces query embeddings without decoding summaries for test conversations. We validate our retrieval approach on MultiWOZ datasets with GPT-Neo-2.7B and LLaMA-7B/30B. The experimental results show a significant improvement over relevant baselines in real few-shot DST settings.
Paper Structure (33 sections, 3 equations, 5 figures, 14 tables)

This paper contains 33 sections, 3 equations, 5 figures, 14 tables.

Figures (5)

  • Figure 1: Comparison between (a) off-the-shelf retriever with query generation and (b) CONVERSE w/o query generation.
  • Figure 2: Concept.(a) Generating a summary of a dialogue with language model (LM). (b) Training the retriever to maximize a similarity between the dialogue and generated summary. (c) Given a test dialogue as a query, we retrieve the dialogue (value) of which summary (key) obtains the best similarity score with the query.
  • Figure 3: JGA of LLaMA-7B with CONVERSE as a function of the number of labeled data.
  • Figure 4: Visualization of importance scores. Tokens with darker blue gets larger weights based on the latest user utterance.
  • Figure 5: A screenshot of the instruction for human evaluation on summaries generated by gpt-3.5-turbo.