Table of Contents
Fetching ...

dCoNNear: An Artifact-Free Neural Network Architecture for Closed-loop Audio Signal Processing

Chuan Wen, Guy Torfs, Sarah Verhulst

TL;DR

dCoNNear addresses artifacts in DNN-based closed-loop audio processing by removing downsampling/upsampling steps and replacing them with a fully convolutional stack of FIR-like memory blocks that dilate over time. The architecture faithfully models cochlear, IHC, and ANF processing while supporting personalized hearing-aid and speech-enhancement applications; training separates auditory module learning from HA optimization and uses a combined loss to align NH and HI responses. Empirical results show substantial reductions in tonal and imaging artifacts, preserved biophysical properties, competitive restoration performance, and real-time inference capabilities, with improved perceptual quality metrics across tasks. The work suggests artifact-free dCoNNear as a robust, generalizable framework for high-fidelity closed-loop audio processing with potential extension to other wave-to-wave audio domains.

Abstract

Recent advances in deep neural networks (DNNs) have significantly improved various audio processing applications, including speech enhancement, synthesis, and hearing-aid algorithms. DNN-based closed-loop systems have gained popularity in these applications due to their robust performance and ability to adapt to diverse conditions. Despite their effectiveness, current DNN-based closed-loop systems often suffer from sound quality degradation caused by artifacts introduced by suboptimal sampling methods. To address this challenge, we introduce dCoNNear, a novel DNN architecture designed for seamless integration into closed-loop frameworks. This architecture specifically aims to prevent the generation of spurious artifacts-most notably tonal and aliasing artifacts arising from non-ideal sampling layers. We demonstrate the effectiveness of dCoNNear through a proof-of-principle example within a closed-loop framework that employs biophysically realistic models of auditory processing for both normal and hearing-impaired profiles to design personalized hearing-aid algorithms. We further validate the broader applicability and artifact-free performance of dCoNNear through speech-enhancement experiments, confirming its ability to improve perceptual sound quality without introducing architecture-induced artifacts. Our results show that dCoNNear not only accurately simulates all processing stages of existing non-DNN biophysical models but also significantly improves sound quality by eliminating audible artifacts in both hearing-aid and speech-enhancement applications. This study offers a robust, perceptually transparent closed-loop processing framework for high-fidelity audio applications.

dCoNNear: An Artifact-Free Neural Network Architecture for Closed-loop Audio Signal Processing

TL;DR

dCoNNear addresses artifacts in DNN-based closed-loop audio processing by removing downsampling/upsampling steps and replacing them with a fully convolutional stack of FIR-like memory blocks that dilate over time. The architecture faithfully models cochlear, IHC, and ANF processing while supporting personalized hearing-aid and speech-enhancement applications; training separates auditory module learning from HA optimization and uses a combined loss to align NH and HI responses. Empirical results show substantial reductions in tonal and imaging artifacts, preserved biophysical properties, competitive restoration performance, and real-time inference capabilities, with improved perceptual quality metrics across tasks. The work suggests artifact-free dCoNNear as a robust, generalizable framework for high-fidelity closed-loop audio processing with potential extension to other wave-to-wave audio domains.

Abstract

Recent advances in deep neural networks (DNNs) have significantly improved various audio processing applications, including speech enhancement, synthesis, and hearing-aid algorithms. DNN-based closed-loop systems have gained popularity in these applications due to their robust performance and ability to adapt to diverse conditions. Despite their effectiveness, current DNN-based closed-loop systems often suffer from sound quality degradation caused by artifacts introduced by suboptimal sampling methods. To address this challenge, we introduce dCoNNear, a novel DNN architecture designed for seamless integration into closed-loop frameworks. This architecture specifically aims to prevent the generation of spurious artifacts-most notably tonal and aliasing artifacts arising from non-ideal sampling layers. We demonstrate the effectiveness of dCoNNear through a proof-of-principle example within a closed-loop framework that employs biophysically realistic models of auditory processing for both normal and hearing-impaired profiles to design personalized hearing-aid algorithms. We further validate the broader applicability and artifact-free performance of dCoNNear through speech-enhancement experiments, confirming its ability to improve perceptual sound quality without introducing architecture-induced artifacts. Our results show that dCoNNear not only accurately simulates all processing stages of existing non-DNN biophysical models but also significantly improves sound quality by eliminating audible artifacts in both hearing-aid and speech-enhancement applications. This study offers a robust, perceptually transparent closed-loop processing framework for high-fidelity audio applications.
Paper Structure (29 sections, 7 equations, 16 figures, 6 tables)

This paper contains 29 sections, 7 equations, 16 figures, 6 tables.

Figures (16)

  • Figure 1: Block diagram of the closed-loop framework for audio applications.
  • Figure 2: Generic diagram of the closed-loop framework for designing individualized hearing-aid algorithms. The three auditory modules—Cochlear, IHC, and ANF—are implemented using deep neural network architectures (e.g., CoNNear and dCoNNear).
  • Figure 3: Comparing the artifacts between different upsampling methods when training to simulate the TL model. The plots display the magnitude spectrum of BM displacement outputs at a center frequency of 1 kHz. From top to bottom, the stimuli consist of a step input and a 1 kHz pure tone at 70 dB SPL, respectively.
  • Figure 4: The 1kHz tone response at different auditory processing stages compared against the target model for normal-hearing (a-c). (d) The 1-kHz tonal input against the output of the HA model trained from the CoNNear-based framework.
  • Figure 5: (a) The block diagram of the dCoNNear. (b) The diagram of the memory block.
  • ...and 11 more figures