Nonlinear Framework for Speech Bandwidth Extension
Tarikul Islam Tamiti, Nursad Mamun, Anomadarshi Barua
TL;DR
This work tackles speech bandwidth extension by marrying chaos-informed nonlinear discriminators with a dual‑stream complex‑valued generator. The CIS‑BWE framework introduces seven discriminators that capture deterministic chaos, fractal dynamics, recurrence, and phase–space structure to provide rich, multi‑scale feedback, while the generator (based on ConformerNeXt with LatticeNet) refines both magnitude and phase. Empirical results on the CSTR VCTK corpus show the proposed model achieves state‑of‑the‑art perceptual quality across multiple frequency-extension ranges, with significant parameter and compute reductions relative to baselines. The combination yields superior NISQA‑MOS, PESQ, and STOI scores and demonstrates strong potential for real‑time deployment in TTS/ASR pipelines, albeit with noted limitations in cross‑lingual generalization and ethical considerations for misuse.
Abstract
Recovering high-frequency components lost to bandwidth constraints is crucial for applications ranging from telecommunications to high-fidelity audio on limited resources. We introduce NDSI-BWE, a new adversarial Band Width Extension (BWE) framework that leverage four new discriminators inspired by nonlinear dynamical system to capture diverse temporal behaviors: a Multi-Resolution Lyapunov Discriminator (MRLD) for determining sensitivity to initial conditions by capturing deterministic chaos, a Multi-Scale Recurrence Discriminator (MS-RD) for self-similar recurrence dynamics, a Multi-Scale Detrended Fractal Analysis Discriminator (MSDFA) for long range slow variant scale invariant relationship, a Multi-Resolution Poincaré Plot Discriminator (MR-PPD) for capturing hidden latent space relationship, a Multi-Period Discriminator (MPD) for cyclical patterns, a Multi-Resolution Amplitude Discriminator (MRAD) and Multi-Resolution Phase Discriminator (MRPD) for capturing intricate amplitude-phase transition statistics. By using depth-wise convolution at the core of the convolutional block with in each discriminators, NDSI-BWE attains an eight-times parameter reduction. These seven discriminators guide a complex-valued ConformerNeXt based genetor with a dual stream Lattice-Net based architecture for simultaneous refinement of magnitude and phase. The genertor leverage the transformer based conformer's global dependency modeling and ConvNeXt block's local temporal modeling capability. Across six objective evaluation metrics and subjective based texts comprises of five human judges, NDSI-BWE establishes a new SoTA in BWE.
