Resampling Filter Design for Multirate Neural Audio Effect Processing
Alistair Carson, Vesa Välimäki, Alec Wright, Stefan Bilbao
TL;DR
This work tackles the problem of fixed training sample rates in neural audio effect models by evaluating resampling-based processing that runs the RNN at its intended rate, then restores the original rate for output. It formalizes a resampling framework with pre- and post-processing filters, comparing four designs for 44.1 kHz ↔ 48 kHz conversion: NB-Kaiser, NB-Remez, HB-IIR + WB-Kaiser, and HB-IIR + WB-Remez, with two-stage designs offering lower latency while preserving harmonic content. The study extends to integer oversampling (M = 2, 4, 8) using cascaded half-band IIR/FIR designs, EQ-Linterp, CIC, and FFT-based references, showing that cascaded HB designs can achieve similar or better aliasing control than FFT-based resampling at a fraction of the computational cost and latency. Across models trained at either rate, resampling methods demonstrate competitive ESR/MESR/ASR/NMR performance, with Kaiser-based designs often providing robust aliasing suppression and low latency, making resampling a practical alternative to SRIRNN model adjustment for real-time neural audio effects.
Abstract
Neural networks have become ubiquitous in audio effects modelling, especially for guitar amplifiers and distortion pedals. One limitation of such models is that the sample rate of the training data is implicitly encoded in the model weights and therefore not readily adjustable at inference. Recent work explored modifications to recurrent neural network architecture to approximate a sample rate independent system, enabling audio processing at a rate that differs from the original training rate. This method works well for integer oversampling and can reduce aliasing caused by nonlinear activation functions. For small fractional changes in sample rate, fractional delay filters can be used to approximate sample rate independence, but in some cases this method fails entirely. Here, we explore the use of real-time signal resampling at the input and output of the neural network as an alternative solution. We investigate several resampling filter designs and show that a two-stage design consisting of a half-band IIR filter cascaded with a Kaiser window FIR filter can give similar or better results to the previously proposed model adjustment method with many fewer filtering operations per sample and less than one millisecond of latency at typical audio rates. Furthermore, we investigate interpolation and decimation filters for the task of integer oversampling and show that cascaded half-band IIR and FIR designs can be used in conjunction with the model adjustment method to reduce aliasing in a range of distortion effect models.
