Table of Contents
Fetching ...

Acoustic Non-Stationarity Objective Assessment with Hard Label Criteria for Supervised Learning Models

Guilherme Zucatelli, Ricardo Barioni, Gabriela Dantas

TL;DR

Acoustic non-stationarity assessment faces heavy computational demands from traditional INS measures. The authors introduce Hard Label Criteria (HLC), which yields a single global binary non-stationarity label per signal by region-wise INS evaluation and adaptive thresholds, enabling supervised models to estimate non-stationarity efficiently. They design NANSA, a transformer-based network with an ANS Encoder and Pattern Extractor trained with binary cross-entropy on HLC labels, plus a lightweight NANSALW variant. Across AudioSet, DCASE, and FSD50K, NANSA and NANSALW consistently surpass baselines in accuracy and reduce processing time by orders of magnitude relative to INS, enabling real-time, on-device non-stationarity assessment. This work provides a scalable framework for objective non-stationarity estimation with practical impact on speech, audio scene analysis, and related tasks.

Abstract

Objective non-stationarity measures are resource intensive and impose critical limitations for real-time processing solutions. In this paper, a novel Hard Label Criteria (HLC) algorithm is proposed to generate a global non-stationarity label for acoustic signals, enabling supervised learning strategies to be trained as stationarity estimators. The HLC is first evaluated on state-of-the-art general-purpose acoustic models, demonstrating that these models capture stationarity information. Furthermore, the first-of-its-kind HLC-based Network for Acoustic Non-Stationarity Assessment (NANSA) is proposed. NANSA models outperform competing approaches, achieving up to 99% classification accuracy, while solving the computational infeasibility of traditional objective measures.

Acoustic Non-Stationarity Objective Assessment with Hard Label Criteria for Supervised Learning Models

TL;DR

Acoustic non-stationarity assessment faces heavy computational demands from traditional INS measures. The authors introduce Hard Label Criteria (HLC), which yields a single global binary non-stationarity label per signal by region-wise INS evaluation and adaptive thresholds, enabling supervised models to estimate non-stationarity efficiently. They design NANSA, a transformer-based network with an ANS Encoder and Pattern Extractor trained with binary cross-entropy on HLC labels, plus a lightweight NANSALW variant. Across AudioSet, DCASE, and FSD50K, NANSA and NANSALW consistently surpass baselines in accuracy and reduce processing time by orders of magnitude relative to INS, enabling real-time, on-device non-stationarity assessment. This work provides a scalable framework for objective non-stationarity estimation with practical impact on speech, audio scene analysis, and related tasks.

Abstract

Objective non-stationarity measures are resource intensive and impose critical limitations for real-time processing solutions. In this paper, a novel Hard Label Criteria (HLC) algorithm is proposed to generate a global non-stationarity label for acoustic signals, enabling supervised learning strategies to be trained as stationarity estimators. The HLC is first evaluated on state-of-the-art general-purpose acoustic models, demonstrating that these models capture stationarity information. Furthermore, the first-of-its-kind HLC-based Network for Acoustic Non-Stationarity Assessment (NANSA) is proposed. NANSA models outperform competing approaches, achieving up to 99% classification accuracy, while solving the computational infeasibility of traditional objective measures.

Paper Structure

This paper contains 11 sections, 5 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Sample spectrogram signals and corresponding INS values extracted from AudioSet eval dataset: Noisy Speech (a), Wooden Knock (b) and Blowing Wind (c).
  • Figure 2: The NANSA model diagram.
  • Figure 3: ROC curves and Area Under Curve (AUC) for acoustic non-stationarity assessment.