Interpolation Filter Design for Sample Rate Independent Audio Effect RNNs

Alistair Carson; Alec Wright; Stefan Bilbao

Interpolation Filter Design for Sample Rate Independent Audio Effect RNNs

Alistair Carson, Alec Wright, Stefan Bilbao

TL;DR

This work tackles sample-rate independence for RNN-based audio effect models by introducing fractional delay filters in the RNN delay line to adjust inference rate from $F_s$ to $F'_s$, enabling both oversampling and undersampling. It compares Lagrange interpolation and minimax FIR designs (orders $K=1..5$) across $160$ GuitarML LSTM models, showing high SNR in many cases but occasional artefacts; undersampling is more problematic and sometimes best results come from using no interpolation. A linearised stability analysis around a fixed point, yielding poles $\mathbf{z}_p$ with the criterion $\max|\mathbf{z}_p| \le 1.0$, predicts when a filter will fail with about $97.8\%$ accuracy, enabling pre-runtime filter selection. The findings suggest model-dependent optimal filter choices and point to future work on model-specific filter design or adjusting network weights to achieve SR independence without interpolation.

Abstract

Recurrent neural networks (RNNs) are effective at emulating the non-linear, stateful behavior of analog guitar amplifiers and distortion effects. Unlike the case of direct circuit simulation, RNNs have a fixed sample rate encoded in their model weights, making the sample rate non-adjustable during inference. Recent work has proposed increasing the sample rate of RNNs at inference (oversampling) by increasing the feedback delay length in samples, using a fractional delay filter for non-integer conversions. Here, we investigate the task of lowering the sample rate at inference (undersampling), and propose using an extrapolation filter to approximate the required fractional signal advance. We consider two filter design methods and analyse the impact of filter order on audio quality. Our results show that the correct choice of filter can give high quality results for both oversampling and undersampling; however, in some cases the sample rate adjustment leads to unwanted artefacts in the output signal. We analyse these failure cases through linearised stability analysis, showing that they result from instability around a fixed point. This approach enables an informed prediction of suitable interpolation filters for a given RNN model before runtime.

Interpolation Filter Design for Sample Rate Independent Audio Effect RNNs

TL;DR

This work tackles sample-rate independence for RNN-based audio effect models by introducing fractional delay filters in the RNN delay line to adjust inference rate from

, enabling both oversampling and undersampling. It compares Lagrange interpolation and minimax FIR designs (orders

) across

GuitarML LSTM models, showing high SNR in many cases but occasional artefacts; undersampling is more problematic and sometimes best results come from using no interpolation. A linearised stability analysis around a fixed point, yielding poles

with the criterion

, predicts when a filter will fail with about

accuracy, enabling pre-runtime filter selection. The findings suggest model-dependent optimal filter choices and point to future work on model-specific filter design or adjusting network weights to achieve SR independence without interpolation.

Abstract

Paper Structure (11 sections, 9 equations, 4 figures, 1 table)

This paper contains 11 sections, 9 equations, 4 figures, 1 table.

Introduction
Problem Statement
Sample rate independent RNNs
Lagrange interpolation
Minimax design
Experiment details
Results
Linearised analysis
Experiment
Case study
Conclusions and further work

Figures (4)

Figure 1: Magnitude response (top) and phase delay error (bottom) of the candidate fractional delay filters for a) oversampling and b) undersampling.
Figure 2: Violin plots showing the distribution of SNR results for interpolation/extrapolation with the different candidate filters. The horizontal lines indicate the minima, means and maxima respectively.
Figure 3: Output spectrogram of the MatchlessSC30 model checkpoint under zero-input conditions, initially with no interpolation/extrapolation in the delay line. At 10k samples, linear extrapolation is enabled, causing the model to self-oscillate. The red dashed line shows the unstable pole angle (see Fig. \ref{['fig:pz']}).
Figure 4: Poles of the linearised MatchlessSC30 model a) as trained (no interpolation) and b) with linear extrapolation in the delay-line. The black dashed line represents the unit circle. The red dashed lines show the pole angle of the unstable conjugate pair.

Interpolation Filter Design for Sample Rate Independent Audio Effect RNNs

TL;DR

Abstract

Interpolation Filter Design for Sample Rate Independent Audio Effect RNNs

Authors

TL;DR

Abstract

Table of Contents

Figures (4)