Does Collaborative Human-LM Dialogue Generation Help Information Extraction from Human Dialogues?

Bo-Ru Lu; Nikita Haduong; Chia-Hsuan Lee; Zeqiu Wu; Hao Cheng; Paul Koester; Jean Utke; Tao Yu; Noah A. Smith; Mari Ostendorf

Does Collaborative Human-LM Dialogue Generation Help Information Extraction from Human Dialogues?

Bo-Ru Lu, Nikita Haduong, Chia-Hsuan Lee, Zeqiu Wu, Hao Cheng, Paul Koester, Jean Utke, Tao Yu, Noah A. Smith, Mari Ostendorf

TL;DR

This work tackles the privacy barrier in sharing human-human dialogues by proposing DialGen, a human-in-the-loop dialogue generation framework that synthesizes long, complex call-center conversations for information extraction. DialGen combines a language model with human reviewers to generate, edit, and annotate synthetic dialogues guided by an ontology, enabling controlled coverage of diverse entity-slot-value information. The authors introduce an entity-centric IE scoring scheme and demonstrate that synthetic data, when used with real conversations, significantly improves F1 on private auto-insurance IE tasks, with notable gains in recall and slot-value accuracy. The approach yields a practical pathway to develop rich, privacy-preserving dialogue datasets and can enhance information extraction in privacy-constrained domains.

Abstract

The capabilities of pretrained language models have opened opportunities to explore new application areas, but applications involving human-human interaction are limited by the fact that most data is protected from public release for privacy reasons. Problem-solving human dialogues in real applications can be much more complex than existing Wizard-of-Oz collections, preventing successful domain transfer. To support information extraction (IE) for a private call center dataset, we introduce a human-in-the-loop dialogue generation framework capable of synthesizing realistic dialogues. In IE experiments with auto insurance call center dialogues, we observe 25\% relative improvement in $F_1$ after augmenting a small set of real human conversations with synthetic data. We release code and our synthetic dataset to illustrate the complexity of real-world call center conversations and encourage development of complex dialogue datasets that are more representative of natural data.

Does Collaborative Human-LM Dialogue Generation Help Information Extraction from Human Dialogues?

TL;DR

Abstract

after augmenting a small set of real human conversations with synthetic data. We release code and our synthetic dataset to illustrate the complexity of real-world call center conversations and encourage development of complex dialogue datasets that are more representative of natural data.

Paper Structure (50 sections, 5 equations, 7 figures, 12 tables)

This paper contains 50 sections, 5 equations, 7 figures, 12 tables.

Introduction
Dialogue Generation (DialGen)
Prompt for Dialogue Generation
Task Description.
Entity-slot-value Triplets.
Story.
Personality.
Dialogue History.
Subdialogue Generation
Human-in-the-loop Review.
Annotation.
Problem Definition and Evaluation
Problem Definition
Definition of Extracted Information
Evaluation
...and 35 more sections

Figures (7)

Figure 1: An illustrative snippet of our dialogue with entity-slot-value triples. Yellow is the slot with multiple values. Italic blue and yellow are the same slot (Damage Part) with different entities (e.g., Caller and Other Driver). Red is a slot with a value update.
Figure 2: In the DialGen framework, a language model (LM) and a human reviewer collaborate to generate a dialogue. First, a story is created by the LM, using randomly sampled entity-slot-value triplets from the ontology. Second, the LM generates a subdialogue, using a task description, triplets, story, personalities, and dialogue history. The reviewer evaluates how the subdialogue fits with the task requirements and dialogue history. If not satisfied, the reviewer can have the LM regenerate the subdialogue before revising it. The revised subdialogue is added to the dialogue history for generating the next subdialogue. This iterative process continues until the dialogue is complete.
Figure 3: CB precision and recall scores on the AIC test set. All scores are based on T5-SC models.
Figure 4: $\textsc{TLB-}F_1$ scores for T5-SC on AIC test set by varying the amount of DialGen-AIC training data.
Figure 5: $\textsc{tlb}$ and three diagnostic scores for precision and recall ($m_{\textsc{r}}$, $m_{\textsc{rs}}$, and $m_{\textsc{sv}}$) for the T5-SC model on AIC test set.
...and 2 more figures

Does Collaborative Human-LM Dialogue Generation Help Information Extraction from Human Dialogues?

TL;DR

Abstract

Does Collaborative Human-LM Dialogue Generation Help Information Extraction from Human Dialogues?

Authors

TL;DR

Abstract

Table of Contents

Figures (7)