Table of Contents
Fetching ...

CoDeTT: A Context-Aware Decision Benchmark for Turn-Taking Evaluation

Huan Shen, Yingao Wang, Shangkun Huang, Wei Zou, Yunzhang Chen

Abstract

Turn-taking modeling is fundamental to spoken dialogue systems, yet its evaluation remains fragmented and often limited to binary boundary detection under narrow interaction settings. Such protocols hinder systematic comparison and obscure model weaknesses across conversational conditions. We present CoDeTT, a context-aware decision benchmark for turn-taking evaluation. CoDeTT formulates turn-taking as a structured decision problem and constructs a multi-scenario dataset with fine-grained decision categories and controlled context variations. Under a unified evaluation protocol, we assess representative existing models and observe substantial performance disparities across decision types and interaction scenarios. CoDeTT provides a standardized benchmark for systematic and context-aware evaluation of turn-taking systems. The benchmark dataset and evaluation toolkit are available at https://yingaowang-casia.github.io/CoDeTT.github.io/.

CoDeTT: A Context-Aware Decision Benchmark for Turn-Taking Evaluation

Abstract

Turn-taking modeling is fundamental to spoken dialogue systems, yet its evaluation remains fragmented and often limited to binary boundary detection under narrow interaction settings. Such protocols hinder systematic comparison and obscure model weaknesses across conversational conditions. We present CoDeTT, a context-aware decision benchmark for turn-taking evaluation. CoDeTT formulates turn-taking as a structured decision problem and constructs a multi-scenario dataset with fine-grained decision categories and controlled context variations. Under a unified evaluation protocol, we assess representative existing models and observe substantial performance disparities across decision types and interaction scenarios. CoDeTT provides a standardized benchmark for systematic and context-aware evaluation of turn-taking systems. The benchmark dataset and evaluation toolkit are available at https://yingaowang-casia.github.io/CoDeTT.github.io/.

Paper Structure

This paper contains 16 sections, 1 equation, 3 figures, 3 tables.

Figures (3)

  • Figure 1: A conceptual comparison between traditional action-based benchmarks and the proposed CoDeTT intent-based diagnostic benchmark.
  • Figure 2: The CoDeTT dataset construction pipeline.
  • Figure 3: Fine-grained semantic confusion matrix of GPT-4o-audio (Chinese, 3-turn history).