Modeling Time-Variant Responses of Optical Compressors with Selective State Space Models
Riccardo Simionato, Stefano Fasciani
TL;DR
This work tackles the challenge of accurately emulating optical dynamic range compressors with a neural, low-latency approach. It introduces Selective State Space (S6) networks augmented with FiLM and TemporalFiLM conditioning to capture both the magnitude-driven compression and the device-specific timing dynamics, enabling per-sample output with minimal latency ($y_n = g x_n$, using a 64-sample input window). Across hardware (LA-2A, TubeTech CL 1B) and software emulations, the S6-based architecture outperforms LSTM, ED, S4D, and TCN baselines on multiple objective metrics and perceptual tests, especially for time-variant behaviors. The results underscore the model’s ability to generalize to unseen parameter settings and highlight the nuanced impact of control-parameter sampling density on interpolation accuracy. Overall, the method offers a practical path to real-time, high-fidelity emulation of analog optical dynamics with potential applicability to other time-variant audio effects.
Abstract
This paper presents a method for modeling optical dynamic range compressors using deep neural networks with Selective State Space models. The proposed approach surpasses previous methods based on recurrent layers by employing a Selective State Space block to encode the input audio. It features a refined technique integrating Feature-wise Linear Modulation and Gated Linear Units to adjust the network dynamically, conditioning the compression's attack and release phases according to external parameters. The proposed architecture is well-suited for low-latency and real-time applications, crucial in live audio processing. The method has been validated on the analog optical compressors TubeTech CL 1B and Teletronix LA-2A, which possess distinct characteristics. Evaluation is performed using quantitative metrics and subjective listening tests, comparing the proposed method with other state-of-the-art models. Results show that our black-box modeling methods outperform all others, achieving accurate emulation of the compression process for both seen and unseen settings during training. We further show a correlation between this accuracy and the sampling density of the control parameters in the dataset and identify settings with fast attack and slow release as the most challenging to emulate.
