Hyper Recurrent Neural Network: Condition Mechanisms for Black-box Audio Effect Modeling
Yen-Tung Yeh, Wen-Yi Hsiao, Yi-Hsuan Yang
TL;DR
Conventional RNN-based virtual-analog modeling often conditions on knobs via simple concatenation, which limits expressive capacity. This work proposes three hypernetwork-based conditioning schemes—FiLM-RNN, StaticHyper-RNN, and DynamicHyper-RNN—to adapt model behavior to control parameters, and introduces a transient reconstruction metric to evaluate short-lived events. Across two devices (LA-2A and OD-3) and several objective metrics, all three methods outperform concatenation, with DynamicHyper-RNN delivering the strongest gains at higher computational cost while StaticHyper-RNN offers substantial compute savings. The study advances black-box audio effect emulation by integrating time-varying, parameter-conditioned weight modulation, and provides open data and code for reproducibility.
Abstract
Recurrent neural networks (RNNs) have demonstrated impressive results for virtual analog modeling of audio effects. These networks process time-domain audio signals using a series of matrix multiplication and nonlinear activation functions to emulate the behavior of the target device accurately. To additionally model the effect of the knobs for an RNN-based model, existing approaches integrate control parameters by concatenating them channel-wisely with some intermediate representation of the input signal. While this method is parameter-efficient, there is room to further improve the quality of generated audio because the concatenation-based conditioning method has limited capacity in modulating signals. In this paper, we propose three novel conditioning mechanisms for RNNs, tailored for black-box virtual analog modeling. These advanced conditioning mechanisms modulate the model based on control parameters, yielding superior results to existing RNN- and CNN-based architectures across various evaluation metrics.
