Table of Contents
Fetching ...

Investigating Low-Cost LLM Annotation for~Spoken Dialogue Understanding Datasets

Lucas Druart, Valentin Vielzeuf, Yannick Estève

TL;DR

The paper tackles the challenge of enriching spoken TOD datasets with fine-grained semantic representations, addressing the lack of high-quality automatic annotations. It proposes a semi-automatic annotation pipeline built around a Structured Contextual Meaning Representation ontology, leveraging LoRA-based fine-tuning of an open-weight LLM and grammar-constrained decoding to produce structured AMR-like trees. Experiments on the MEDIA French hotel reservation dataset show that fine-tuned models achieve higher semantic-match with human annotations than prompting alone, and that iterative filtering and combining grammar-constrained outputs can improve annotation quality. The approach offers a scalable, low-cost path to high-quality spoken-dialogue annotations with potential for multi-domain and multilingual extension.

Abstract

In spoken Task-Oriented Dialogue (TOD) systems, the choice of the semantic representation describing the users' requests is key to a smooth interaction. Indeed, the system uses this representation to reason over a database and its domain knowledge to choose its next action. The dialogue course thus depends on the information provided by this semantic representation. While textual datasets provide fine-grained semantic representations, spoken dialogue datasets fall behind. This paper provides insights into automatic enhancement of spoken dialogue datasets' semantic representations. Our contributions are three fold: (1) assess the relevance of Large Language Model fine-tuning, (2) evaluate the knowledge captured by the produced annotations and (3) highlight semi-automatic annotation implications.

Investigating Low-Cost LLM Annotation for~Spoken Dialogue Understanding Datasets

TL;DR

The paper tackles the challenge of enriching spoken TOD datasets with fine-grained semantic representations, addressing the lack of high-quality automatic annotations. It proposes a semi-automatic annotation pipeline built around a Structured Contextual Meaning Representation ontology, leveraging LoRA-based fine-tuning of an open-weight LLM and grammar-constrained decoding to produce structured AMR-like trees. Experiments on the MEDIA French hotel reservation dataset show that fine-tuned models achieve higher semantic-match with human annotations than prompting alone, and that iterative filtering and combining grammar-constrained outputs can improve annotation quality. The approach offers a scalable, low-cost path to high-quality spoken-dialogue annotations with potential for multi-domain and multilingual extension.

Abstract

In spoken Task-Oriented Dialogue (TOD) systems, the choice of the semantic representation describing the users' requests is key to a smooth interaction. Indeed, the system uses this representation to reason over a database and its domain knowledge to choose its next action. The dialogue course thus depends on the information provided by this semantic representation. While textual datasets provide fine-grained semantic representations, spoken dialogue datasets fall behind. This paper provides insights into automatic enhancement of spoken dialogue datasets' semantic representations. Our contributions are three fold: (1) assess the relevance of Large Language Model fine-tuning, (2) evaluate the knowledge captured by the produced annotations and (3) highlight semi-automatic annotation implications.
Paper Structure (16 sections, 1 equation, 3 figures, 2 tables, 1 algorithm)

This paper contains 16 sections, 1 equation, 3 figures, 2 tables, 1 algorithm.

Figures (3)

  • Figure 1: Example of enriched annotation for a user turn of the MEDIA dataset. It can be translated as "I would like to book err two double bedrooms and one single bedroom for the five days of Christmas at Christmas so in the eighth district of Paris.". Node identifiers are in red, node types in blue, relation types in green and structure in black. Transcription spans are quoted.
  • Figure 2: Overview of the semi-automatic annotation pipeline with the parameters of each step.
  • Figure 3: Smatch scores distribution of pairwise comparisons of human and automatic annotations.