Acoustic Non-Stationarity Objective Assessment with Hard Label Criteria for Supervised Learning Models
Guilherme Zucatelli, Ricardo Barioni, Gabriela Dantas
TL;DR
Acoustic non-stationarity assessment faces heavy computational demands from traditional INS measures. The authors introduce Hard Label Criteria (HLC), which yields a single global binary non-stationarity label per signal by region-wise INS evaluation and adaptive thresholds, enabling supervised models to estimate non-stationarity efficiently. They design NANSA, a transformer-based network with an ANS Encoder and Pattern Extractor trained with binary cross-entropy on HLC labels, plus a lightweight NANSALW variant. Across AudioSet, DCASE, and FSD50K, NANSA and NANSALW consistently surpass baselines in accuracy and reduce processing time by orders of magnitude relative to INS, enabling real-time, on-device non-stationarity assessment. This work provides a scalable framework for objective non-stationarity estimation with practical impact on speech, audio scene analysis, and related tasks.
Abstract
Objective non-stationarity measures are resource intensive and impose critical limitations for real-time processing solutions. In this paper, a novel Hard Label Criteria (HLC) algorithm is proposed to generate a global non-stationarity label for acoustic signals, enabling supervised learning strategies to be trained as stationarity estimators. The HLC is first evaluated on state-of-the-art general-purpose acoustic models, demonstrating that these models capture stationarity information. Furthermore, the first-of-its-kind HLC-based Network for Acoustic Non-Stationarity Assessment (NANSA) is proposed. NANSA models outperform competing approaches, achieving up to 99% classification accuracy, while solving the computational infeasibility of traditional objective measures.
