Table of Contents
Fetching ...

Modeling the One-to-Many Property in Open-Domain Dialogue with LLMs

Jing Yang Lee, Kong-Aik Lee, Woon-Seng Gan

TL;DR

This work tackles the one-to-many property of open-domain dialogue by proposing a two-stage framework: Multi-Response Generation (MRG) to produce a diverse set of coherent responses, followed by Preference-based Selection (PS) to pick the best option via a learned preference model (ODRP). It introduces o2mDial, a dataset explicitly capturing multiple plausible responses per context, and develops new evaluation metrics for diversity and coherence. The approach, tested on smaller LLMs with in-context learning and instruction tuning, yields significant improvements in response diversity while preserving quality, approaching the performance of larger models and highlighting the practical feasibility of resource-efficient open-domain dialogue systems. The combination of o2mDial, MRG methods, and PS demonstrates a scalable path to richer, more engaging conversations in settings with limited computational resources.

Abstract

Open-domain Dialogue (OD) exhibits a one-to-many (o2m) property, whereby multiple appropriate responses exist for a single dialogue context. Despite prior research showing that modeling this property boosts response diversity, most modern LLM-based dialogue agents do not explicitly do so. In this work, we model the o2m property of OD in LLMs by decomposing OD generation into two key tasks: Multi-Response Generation (MRG) and Preference-based Selection (PS), which entail generating a set of n semantically and lexically diverse high-quality responses for a given dialogue context, followed by selecting a single response based on human preference, respectively. To facilitate MRG and PS, we introduce o2mDial, a dialogue corpus explicitly designed to capture the o2m property by featuring multiple plausible responses for each context. Leveraging o2mDial, we propose new in-context learning and instruction-tuning strategies, as well as novel evaluation metrics for MRG, alongside a model-based approach for PS. Empirical results demonstrate that applying the proposed two-stage framework to smaller LLMs for OD generation enhances overall response diversity while maintaining contextual coherence, improving response quality by up to 90%, bringing them closer to the performance of larger models.

Modeling the One-to-Many Property in Open-Domain Dialogue with LLMs

TL;DR

This work tackles the one-to-many property of open-domain dialogue by proposing a two-stage framework: Multi-Response Generation (MRG) to produce a diverse set of coherent responses, followed by Preference-based Selection (PS) to pick the best option via a learned preference model (ODRP). It introduces o2mDial, a dataset explicitly capturing multiple plausible responses per context, and develops new evaluation metrics for diversity and coherence. The approach, tested on smaller LLMs with in-context learning and instruction tuning, yields significant improvements in response diversity while preserving quality, approaching the performance of larger models and highlighting the practical feasibility of resource-efficient open-domain dialogue systems. The combination of o2mDial, MRG methods, and PS demonstrates a scalable path to richer, more engaging conversations in settings with limited computational resources.

Abstract

Open-domain Dialogue (OD) exhibits a one-to-many (o2m) property, whereby multiple appropriate responses exist for a single dialogue context. Despite prior research showing that modeling this property boosts response diversity, most modern LLM-based dialogue agents do not explicitly do so. In this work, we model the o2m property of OD in LLMs by decomposing OD generation into two key tasks: Multi-Response Generation (MRG) and Preference-based Selection (PS), which entail generating a set of n semantically and lexically diverse high-quality responses for a given dialogue context, followed by selecting a single response based on human preference, respectively. To facilitate MRG and PS, we introduce o2mDial, a dialogue corpus explicitly designed to capture the o2m property by featuring multiple plausible responses for each context. Leveraging o2mDial, we propose new in-context learning and instruction-tuning strategies, as well as novel evaluation metrics for MRG, alongside a model-based approach for PS. Empirical results demonstrate that applying the proposed two-stage framework to smaller LLMs for OD generation enhances overall response diversity while maintaining contextual coherence, improving response quality by up to 90%, bringing them closer to the performance of larger models.

Paper Structure

This paper contains 18 sections, 5 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: One-to-many property of open-domain dialogue.
  • Figure 2: Sample dialogue context and response set pair from our corpus.
  • Figure 3: Prompt template for the Few-Shot prompt.
  • Figure 4: Prompt template for the Chain-of-Thought prompt.
  • Figure 5: Prompt template for the Prompt Chain (PC).
  • ...and 2 more figures