Speak & Spell: LLM-Driven Controllable Phonetic Error Augmentation for Robust Dialogue State Tracking

Jihyun Lee; Solee Im; Wonjun Lee; Gary Geunbae Lee

Speak & Spell: LLM-Driven Controllable Phonetic Error Augmentation for Robust Dialogue State Tracking

Jihyun Lee, Solee Im, Wonjun Lee, Gary Geunbae Lee

TL;DR

A simple yet effective data augmentation method that targets named entity errors from Automatic Speech Recognition systems to improve the robustness of DST model and generates sufficient error patterns on keywords, leading to improved accuracy in noised and low-accuracy ASR environments.

Abstract

Dialogue State Tracking (DST) is a key part of task-oriented dialogue systems, identifying important information in conversations. However, its accuracy drops significantly in spoken dialogue environments due to named entity errors from Automatic Speech Recognition (ASR) systems. We introduce a simple yet effective data augmentation method that targets those entities to improve the robustness of DST model. Our novel method can control the placement of errors using keyword-highlighted prompts while introducing phonetically similar errors. As a result, our method generated sufficient error patterns on keywords, leading to improved accuracy in noised and low-accuracy ASR environments.