dCoNNear: An Artifact-Free Neural Network Architecture for Closed-loop Audio Signal Processing

Chuan Wen; Guy Torfs; Sarah Verhulst

dCoNNear: An Artifact-Free Neural Network Architecture for Closed-loop Audio Signal Processing

Chuan Wen, Guy Torfs, Sarah Verhulst

TL;DR

dCoNNear addresses artifacts in DNN-based closed-loop audio processing by removing downsampling/upsampling steps and replacing them with a fully convolutional stack of FIR-like memory blocks that dilate over time. The architecture faithfully models cochlear, IHC, and ANF processing while supporting personalized hearing-aid and speech-enhancement applications; training separates auditory module learning from HA optimization and uses a combined loss to align NH and HI responses. Empirical results show substantial reductions in tonal and imaging artifacts, preserved biophysical properties, competitive restoration performance, and real-time inference capabilities, with improved perceptual quality metrics across tasks. The work suggests artifact-free dCoNNear as a robust, generalizable framework for high-fidelity closed-loop audio processing with potential extension to other wave-to-wave audio domains.

Abstract

Recent advances in deep neural networks (DNNs) have significantly improved various audio processing applications, including speech enhancement, synthesis, and hearing-aid algorithms. DNN-based closed-loop systems have gained popularity in these applications due to their robust performance and ability to adapt to diverse conditions. Despite their effectiveness, current DNN-based closed-loop systems often suffer from sound quality degradation caused by artifacts introduced by suboptimal sampling methods. To address this challenge, we introduce dCoNNear, a novel DNN architecture designed for seamless integration into closed-loop frameworks. This architecture specifically aims to prevent the generation of spurious artifacts-most notably tonal and aliasing artifacts arising from non-ideal sampling layers. We demonstrate the effectiveness of dCoNNear through a proof-of-principle example within a closed-loop framework that employs biophysically realistic models of auditory processing for both normal and hearing-impaired profiles to design personalized hearing-aid algorithms. We further validate the broader applicability and artifact-free performance of dCoNNear through speech-enhancement experiments, confirming its ability to improve perceptual sound quality without introducing architecture-induced artifacts. Our results show that dCoNNear not only accurately simulates all processing stages of existing non-DNN biophysical models but also significantly improves sound quality by eliminating audible artifacts in both hearing-aid and speech-enhancement applications. This study offers a robust, perceptually transparent closed-loop processing framework for high-fidelity audio applications.

dCoNNear: An Artifact-Free Neural Network Architecture for Closed-loop Audio Signal Processing

TL;DR

Abstract

Paper Structure (29 sections, 7 equations, 16 figures, 6 tables)

This paper contains 29 sections, 7 equations, 16 figures, 6 tables.

Introduction
Characteristics of upsampling Artifacts
Artifact-free dCoNNear-based closed-loop system
dCoNNear Architecture
Auditory modules and HA model
Individualisation of hearing impairment
Training strategy
Training auditory elements
Training dCoNNearcochlear
Training dCoNNearIHC and dCoNNearANF
Training HA model
Loss function
Hearing impaired elements
Training setup
Evaluation
...and 14 more sections

Figures (16)

Figure 1: Block diagram of the closed-loop framework for audio applications.
Figure 2: Generic diagram of the closed-loop framework for designing individualized hearing-aid algorithms. The three auditory modules—Cochlear, IHC, and ANF—are implemented using deep neural network architectures (e.g., CoNNear and dCoNNear).
Figure 3: Comparing the artifacts between different upsampling methods when training to simulate the TL model. The plots display the magnitude spectrum of BM displacement outputs at a center frequency of 1 kHz. From top to bottom, the stimuli consist of a step input and a 1 kHz pure tone at 70 dB SPL, respectively.
Figure 4: The 1kHz tone response at different auditory processing stages compared against the target model for normal-hearing (a-c). (d) The 1-kHz tonal input against the output of the HA model trained from the CoNNear-based framework.
Figure 5: (a) The block diagram of the dCoNNear. (b) The diagram of the memory block.
...and 11 more figures

dCoNNear: An Artifact-Free Neural Network Architecture for Closed-loop Audio Signal Processing

TL;DR

Abstract

dCoNNear: An Artifact-Free Neural Network Architecture for Closed-loop Audio Signal Processing

Authors

TL;DR

Abstract

Table of Contents

Figures (16)