Table of Contents
Fetching ...

Sample Rate Independent Recurrent Neural Networks for Audio Effects Processing

Alistair Carson, Alec Wright, Jatin Chowdhury, Vesa Välimäki, Stefan Bilbao

TL;DR

This work tackles the challenge of making recurrent neural network-based audio effects models, originally trained at a fixed sample rate $F_s$, operate faithfully at different inference rates $F'_s = M F_s$. It compares several SR-agnostic strategies, including STN, linearly interpolated delay lines (LIDL), all-pass delay lines (APDL), and cubic Lagrange delay lines (CIDL), across linear and neural-network evaluations. Linear analysis shows delay-based methods preserve magnitude better while STN introduces a low-pass tendency, and non-integer oversampling favors the higher-order CIDL interpolation for accuracy; neural-network experiments with GuitarML models confirm that delay-based approaches substantially reduce aliasing and improve SNR relative to naive SR changes, with CIDL offering the best overall non-integer performance. The results support deploying SR-adaptive processing in SR-flexible guitar effect plugins, enabling high-quality oversampling without retraining or multiple models, with practical implications for real-time audio processing.

Abstract

In recent years, machine learning approaches to modelling guitar amplifiers and effects pedals have been widely investigated and have become standard practice in some consumer products. In particular, recurrent neural networks (RNNs) are a popular choice for modelling non-linear devices such as vacuum tube amplifiers and distortion circuitry. One limitation of such models is that they are trained on audio at a specific sample rate and therefore give unreliable results when operating at another rate. Here, we investigate several methods of modifying RNN structures to make them approximately sample rate independent, with a focus on oversampling. In the case of integer oversampling, we demonstrate that a previously proposed delay-based approach provides high fidelity sample rate conversion whilst additionally reducing aliasing. For non-integer sample rate adjustment, we propose two novel methods and show that one of these, based on cubic Lagrange interpolation of a delay-line, provides a significant improvement over existing methods. To our knowledge, this work provides the first in-depth study into this problem.

Sample Rate Independent Recurrent Neural Networks for Audio Effects Processing

TL;DR

This work tackles the challenge of making recurrent neural network-based audio effects models, originally trained at a fixed sample rate , operate faithfully at different inference rates . It compares several SR-agnostic strategies, including STN, linearly interpolated delay lines (LIDL), all-pass delay lines (APDL), and cubic Lagrange delay lines (CIDL), across linear and neural-network evaluations. Linear analysis shows delay-based methods preserve magnitude better while STN introduces a low-pass tendency, and non-integer oversampling favors the higher-order CIDL interpolation for accuracy; neural-network experiments with GuitarML models confirm that delay-based approaches substantially reduce aliasing and improve SNR relative to naive SR changes, with CIDL offering the best overall non-integer performance. The results support deploying SR-adaptive processing in SR-flexible guitar effect plugins, enabling high-quality oversampling without retraining or multiple models, with practical implications for real-time audio processing.

Abstract

In recent years, machine learning approaches to modelling guitar amplifiers and effects pedals have been widely investigated and have become standard practice in some consumer products. In particular, recurrent neural networks (RNNs) are a popular choice for modelling non-linear devices such as vacuum tube amplifiers and distortion circuitry. One limitation of such models is that they are trained on audio at a specific sample rate and therefore give unreliable results when operating at another rate. Here, we investigate several methods of modifying RNN structures to make them approximately sample rate independent, with a focus on oversampling. In the case of integer oversampling, we demonstrate that a previously proposed delay-based approach provides high fidelity sample rate conversion whilst additionally reducing aliasing. For non-integer sample rate adjustment, we propose two novel methods and show that one of these, based on cubic Lagrange interpolation of a delay-line, provides a significant improvement over existing methods. To our knowledge, this work provides the first in-depth study into this problem.
Paper Structure (17 sections, 21 equations, 9 figures, 1 table)

This paper contains 17 sections, 21 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Baseline recurrent neural network architecture studied in this work, where $f$ is a non-linear recurrent cell and $g$ is a fully connected affine layer.
  • Figure 2: STN method: modified RNN architecture for changing the SR by factor of $M$ by scaling the state residual Parker2019.
  • Figure 3: LIDL method: modified RNN architecture for oversampling by a factor of $M$ by linearly interpolating between successive states. $\Delta = M - \lfloor M \rfloor$ and $\Delta' = 1 - \Delta$. Adapted from Chowdhury2022.
  • Figure 4: Proposed APDL method: modified RNN architecture for oversampling by a non-integer factor of $M$ with an all-pass filter that implements a fractional delay. $\eta$ is given by Eq. \ref{['ap_param']}.
  • Figure 5: Proposed CIDL method: modified RNN architecture for oversampling by a factor of $M$ with third-order Lagrange delay-line interpolation. The filter coefficients, $l$, and delay $\gamma$ are given by Eqs. \ref{['eq:lagrange_coeffs']} and \ref{['eq:gamma_delta']} respectively.
  • ...and 4 more figures