Investigating Low-Cost LLM Annotation for~Spoken Dialogue Understanding Datasets
Lucas Druart, Valentin Vielzeuf, Yannick Estève
TL;DR
The paper tackles the challenge of enriching spoken TOD datasets with fine-grained semantic representations, addressing the lack of high-quality automatic annotations. It proposes a semi-automatic annotation pipeline built around a Structured Contextual Meaning Representation ontology, leveraging LoRA-based fine-tuning of an open-weight LLM and grammar-constrained decoding to produce structured AMR-like trees. Experiments on the MEDIA French hotel reservation dataset show that fine-tuned models achieve higher semantic-match with human annotations than prompting alone, and that iterative filtering and combining grammar-constrained outputs can improve annotation quality. The approach offers a scalable, low-cost path to high-quality spoken-dialogue annotations with potential for multi-domain and multilingual extension.
Abstract
In spoken Task-Oriented Dialogue (TOD) systems, the choice of the semantic representation describing the users' requests is key to a smooth interaction. Indeed, the system uses this representation to reason over a database and its domain knowledge to choose its next action. The dialogue course thus depends on the information provided by this semantic representation. While textual datasets provide fine-grained semantic representations, spoken dialogue datasets fall behind. This paper provides insights into automatic enhancement of spoken dialogue datasets' semantic representations. Our contributions are three fold: (1) assess the relevance of Large Language Model fine-tuning, (2) evaluate the knowledge captured by the produced annotations and (3) highlight semi-automatic annotation implications.
