CONMOD: Controllable Neural Frame-based Modulation Effects
Gyubin Lee, Hounsu Kim, Junwon Lee, Juhan Nam
TL;DR
CONMOD addresses the lack of controllability in neural modelling of LFO-driven audio effects by predicting a frame-wise transfer function conditioned on LFO frequency and feedback. The approach combines an LSTM on the LFO with an MLP and FiLM conditioning to enable continuous control and to learn a shared embedding space for multiple phaser effects, including steerability between two distinct phasers. Through a multi-LFO training regime and a chirp-based training protocol, CONMOD achieves superior accuracy over a prior baseline and demonstrates robustness to unseen control settings and long audio sequences. This work enables flexible, creative neural emulations of LFO-based modulation with potential for broader universal modelling of time-varying audio effects.
Abstract
Deep learning models have seen widespread use in modelling LFO-driven audio effects, such as phaser and flanger. Although existing neural architectures exhibit high-quality emulation of individual effects, they do not possess the capability to manipulate the output via control parameters. To address this issue, we introduce Controllable Neural Frame-based Modulation Effects (CONMOD), a single black-box model which emulates various LFO-driven effects in a frame-wise manner, offering control over LFO frequency and feedback parameters. Additionally, the model is capable of learning the continuous embedding space of two distinct phaser effects, enabling us to steer between effects and achieve creative outputs. Our model outperforms previous work while possessing both controllability and universality, presenting opportunities to enhance creativity in modern LFO-driven audio effects.
