Test-Time Training for Speech Enhancement
Avishkar Behera, Riya Ann Easow, Venkatesh Parvathala, K. Sri Rama Murty
TL;DR
This work applies Test-Time Training (TTT) to speech enhancement to address unpredictable noise and domain shifts by coupling a main denoising task with a self-supervised auxiliary task in a Y-shaped encoder framework. During training, the model minimizes $\mathcal{L}_m$ plus $\mathcal{L}_s$, and during inference, it adapts by minimizing $\mathcal{L}_s$ with a pseudo-label, yielding updated $\theta^*_e$ and $\theta^*_s$ before predicting with the main branch. The study compares MSP and NyTT variants across four adaptation strategies on Valentini synthetic data and real DNS data, finding NyTT-real best preserves speech content while NyTT-gaussian excels at noise suppression, with online-batch strategies offering strong cross-domain gains and bias-only updates offering efficiency. The results demonstrate robust, near-real-time adaptive SE that can mitigate domain shifts without labeled targets, suggesting practical deployment potential and avenues for personalized or SOTA-model integrations.
Abstract
This paper introduces a novel application of Test-Time Training (TTT) for Speech Enhancement, addressing the challenges posed by unpredictable noise conditions and domain shifts. This method combines a main speech enhancement task with a self-supervised auxiliary task in a Y-shaped architecture. The model dynamically adapts to new domains during inference time by optimizing the proposed self-supervised tasks like noise-augmented signal reconstruction or masked spectrogram prediction, bypassing the need for labeled data. We further introduce various TTT strategies offering a trade-off between adaptation and efficiency. Evaluations across synthetic and real-world datasets show consistent improvements across speech quality metrics, outperforming the baseline model. This work highlights the effectiveness of TTT in speech enhancement, providing insights for future research in adaptive and robust speech processing.
