Interpolation Filter Design for Sample Rate Independent Audio Effect RNNs
Alistair Carson, Alec Wright, Stefan Bilbao
TL;DR
This work tackles sample-rate independence for RNN-based audio effect models by introducing fractional delay filters in the RNN delay line to adjust inference rate from $F_s$ to $F'_s$, enabling both oversampling and undersampling. It compares Lagrange interpolation and minimax FIR designs (orders $K=1..5$) across $160$ GuitarML LSTM models, showing high SNR in many cases but occasional artefacts; undersampling is more problematic and sometimes best results come from using no interpolation. A linearised stability analysis around a fixed point, yielding poles $\mathbf{z}_p$ with the criterion $\max|\mathbf{z}_p| \le 1.0$, predicts when a filter will fail with about $97.8\%$ accuracy, enabling pre-runtime filter selection. The findings suggest model-dependent optimal filter choices and point to future work on model-specific filter design or adjusting network weights to achieve SR independence without interpolation.
Abstract
Recurrent neural networks (RNNs) are effective at emulating the non-linear, stateful behavior of analog guitar amplifiers and distortion effects. Unlike the case of direct circuit simulation, RNNs have a fixed sample rate encoded in their model weights, making the sample rate non-adjustable during inference. Recent work has proposed increasing the sample rate of RNNs at inference (oversampling) by increasing the feedback delay length in samples, using a fractional delay filter for non-integer conversions. Here, we investigate the task of lowering the sample rate at inference (undersampling), and propose using an extrapolation filter to approximate the required fractional signal advance. We consider two filter design methods and analyse the impact of filter order on audio quality. Our results show that the correct choice of filter can give high quality results for both oversampling and undersampling; however, in some cases the sample rate adjustment leads to unwanted artefacts in the output signal. We analyse these failure cases through linearised stability analysis, showing that they result from instability around a fixed point. This approach enables an informed prediction of suitable interpolation filters for a given RNN model before runtime.
