Table of Contents
Fetching ...

Data Augmentation Integrating Dialogue Flow and Style to Adapt Spoken Dialogue Systems to Low-Resource User Groups

Zhiyang Qi, Michimasa Inaba

TL;DR

This work tackles the challenge of adapting spoken dialogue systems to low-resource user groups, such as minors, by proposing a targeted data augmentation framework. It combines three components: (i) extracting abstract speaker styles with an LLM, (ii) generating DA histories via dual-finetuned PLMs, and (iii) synthesizing training dialogues with ChatGPT to enrich DA prediction data. Across low-resource splits on the Travel Agency Task Dialogue Corpus, the approach improves exact and partial DA-prediction metrics relative to non-augmented baselines, though full-resource data still yields the strongest performance. The method advances inclusive SDS by enabling more accurate modeling of diverse, data-scarce user groups and offers a scalable path to broader demographic adaptation.

Abstract

This study addresses the interaction challenges encountered by spoken dialogue systems (SDSs) when engaging with users who exhibit distinct conversational behaviors, particularly minors, in scenarios where data are scarce. We propose a novel data augmentation framework to enhance SDS performance for user groups with limited resources. Our approach leverages a large language model (LLM) to extract speaker styles and a pre-trained language model (PLM) to simulate dialogue act history. This method generates enriched and personalized dialogue data, facilitating improved interactions with unique user demographics. Extensive experiments validate the efficacy of our methodology, highlighting its potential to foster the development of more adaptive and inclusive dialogue systems.

Data Augmentation Integrating Dialogue Flow and Style to Adapt Spoken Dialogue Systems to Low-Resource User Groups

TL;DR

This work tackles the challenge of adapting spoken dialogue systems to low-resource user groups, such as minors, by proposing a targeted data augmentation framework. It combines three components: (i) extracting abstract speaker styles with an LLM, (ii) generating DA histories via dual-finetuned PLMs, and (iii) synthesizing training dialogues with ChatGPT to enrich DA prediction data. Across low-resource splits on the Travel Agency Task Dialogue Corpus, the approach improves exact and partial DA-prediction metrics relative to non-augmented baselines, though full-resource data still yields the strongest performance. The method advances inclusive SDS by enabling more accurate modeling of diverse, data-scarce user groups and offers a scalable path to broader demographic adaptation.

Abstract

This study addresses the interaction challenges encountered by spoken dialogue systems (SDSs) when engaging with users who exhibit distinct conversational behaviors, particularly minors, in scenarios where data are scarce. We propose a novel data augmentation framework to enhance SDS performance for user groups with limited resources. Our approach leverages a large language model (LLM) to extract speaker styles and a pre-trained language model (PLM) to simulate dialogue act history. This method generates enriched and personalized dialogue data, facilitating improved interactions with unique user demographics. Extensive experiments validate the efficacy of our methodology, highlighting its potential to foster the development of more adaptive and inclusive dialogue systems.
Paper Structure (19 sections, 7 figures, 4 tables)

This paper contains 19 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Real human-to-human conversations. Speakers adopt various dialogue acts when interacting with users employing diverse speaking styles.
  • Figure 2: Our data augmentation framework is designed to improve the performance of the PLM in predicting DA when interacting with low-resource users who exhibit unique speaking styles. Beginning with dialogues that involve specific target users, we: (1) extract speaker styles, (2) generate DA histories of system interactions with these users, and (3) input this information into ChatGPT for tailored data augmentation.
  • Figure 3: DA History Generation. We conduct two rounds of finetuning: the first round using all available data, and the second round using only data from the target user group, to ensure the generated DA history more closely aligns with the target demographic.
  • Figure 4: Dialogues generated by the variant without speaker styles and our approach.
  • Figure 5: Prompt for Speaker Styles Extraction.
  • ...and 2 more figures