Data Augmentation Integrating Dialogue Flow and Style to Adapt Spoken Dialogue Systems to Low-Resource User Groups
Zhiyang Qi, Michimasa Inaba
TL;DR
This work tackles the challenge of adapting spoken dialogue systems to low-resource user groups, such as minors, by proposing a targeted data augmentation framework. It combines three components: (i) extracting abstract speaker styles with an LLM, (ii) generating DA histories via dual-finetuned PLMs, and (iii) synthesizing training dialogues with ChatGPT to enrich DA prediction data. Across low-resource splits on the Travel Agency Task Dialogue Corpus, the approach improves exact and partial DA-prediction metrics relative to non-augmented baselines, though full-resource data still yields the strongest performance. The method advances inclusive SDS by enabling more accurate modeling of diverse, data-scarce user groups and offers a scalable path to broader demographic adaptation.
Abstract
This study addresses the interaction challenges encountered by spoken dialogue systems (SDSs) when engaging with users who exhibit distinct conversational behaviors, particularly minors, in scenarios where data are scarce. We propose a novel data augmentation framework to enhance SDS performance for user groups with limited resources. Our approach leverages a large language model (LLM) to extract speaker styles and a pre-trained language model (PLM) to simulate dialogue act history. This method generates enriched and personalized dialogue data, facilitating improved interactions with unique user demographics. Extensive experiments validate the efficacy of our methodology, highlighting its potential to foster the development of more adaptive and inclusive dialogue systems.
