DHAuDS: A Dynamic and Heterogeneous Audio Benchmark for Test-Time Adaptation

Weichuang Shao; Iman Yi Liao; Tomas Henrique Bode Maul; Tissa Chandesa

DHAuDS: A Dynamic and Heterogeneous Audio Benchmark for Test-Time Adaptation

Weichuang Shao, Iman Yi Liao, Tomas Henrique Bode Maul, Tissa Chandesa

TL;DR

DHAuDS addresses the challenge of domain shift in audio classification by introducing a unified benchmark that simulates realistic, dynamic, and heterogeneous acoustic degradations. It defines four datasets—UrbanSound8K-C, SpeechCommandsV2-C, VocalSound-C, and ReefSet-C—each with dynamic severity and mixed-noise conditions, accompanied by 14 evaluation criteria and 50 unrepeated metrics across 124 experiments. The methodology combines entropy-based TTA losses with a consistency loss over two temporally shifted views, and employs a binary learning-rate strategy to improve stability and performance during adaptation. Findings show consistent post-adaptation gains across datasets and models (HuBERT, AMAuT, CoNMix++), with insights into hyperparameter stability and trade-offs, while emphasizing DHAuDS’s reproducibility and real-world relevance for advancing robust, adaptive audio modeling.

Abstract

Audio classifiers frequently face domain shift, when models trained on one dataset lose accuracy on data recorded in acoustically different conditions. Previous Test-Time Adaptation (TTA) research in speech and sound analysis often evaluates models under fixed or mismatched noise settings, that fail to mimic real-world variability. To overcome these limitations, this paper presents DHAuDS (Dynamic and Heterogeneous Audio Domain Shift), a benchmark designed to assess TTA approaches under more realistic and diverse acoustic shifts. DHAuDS comprises four standardized benchmarks: UrbanSound8K-C, SpeechCommandsV2-C, VocalSound-C, and ReefSet-C, each constructed with dynamic corruption severity levels and heterogeneous noise types to simulate authentic audio degradation scenarios. The framework defines 14 evaluation criteria for each benchmark (8 for UrbanSound8K-C), resulting in 50 unrepeated criteria (124 experiments) that collectively enable fair, reproducible, and cross-domain comparison of TTA algorithms. Through the inclusion of dynamic and mixed-domain noise settings, DHAuDS offers a consistent and publicly reproducible testbed to support ongoing studies in robust and adaptive audio modeling.

DHAuDS: A Dynamic and Heterogeneous Audio Benchmark for Test-Time Adaptation

TL;DR

Abstract

DHAuDS: A Dynamic and Heterogeneous Audio Benchmark for Test-Time Adaptation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)