DiffMoog: a Differentiable Modular Synthesizer for Sound Matching
Noy Uzrad, Oren Barkan, Almog Elharar, Shlomi Shvartzman, Moshe Laufer, Lior Wolf, Noam Koenigstein
TL;DR
This work tackles sound matching with differentiable synthesis by introducing DiffMoog, a modular differentiable synthesizer that retains traditional modules (oscillators, LFOs, filters, envelopes) and adds FM/AM capabilities with flexible routing. It pairs DiffMoog with an end-to-end platform where an encoder predicts chain-specific parameters and a novel signal-chain spectral loss guides gradient-based optimization, formalized as $L_{total} = \mathcal{L}_{\mathrm{p}} + \beta \cdot \mathcal{L}_{\mathrm{SC}}$ and $\mathcal{L}_{\mathrm{p}} = \sum_{n \in \mathcal{N}} L_{\text{reg}}(p_n, \hat{p}_n) + \sum_{m\in \mathcal{M}} L_{\text{cat}}(c_m, \hat{c}_m)$. The authors release an open-source DiffMoog and demonstrate insights on training dynamics, including the benefit of Wasserstein-based spectral losses for frequency estimation and the value of out-of-domain data, while acknowledging remaining challenges with complex FM chains. The platform enables researchers to perform end-to-end audio optimization with interpretable, modular synths, potentially accelerating AI-assisted sound design and synthesis research.
Abstract
This paper presents DiffMoog - a differentiable modular synthesizer with a comprehensive set of modules typically found in commercial instruments. Being differentiable, it allows integration into neural networks, enabling automated sound matching, to replicate a given audio input. Notably, DiffMoog facilitates modulation capabilities (FM/AM), low-frequency oscillators (LFOs), filters, envelope shapers, and the ability for users to create custom signal chains. We introduce an open-source platform that comprises DiffMoog and an end-to-end sound matching framework. This framework utilizes a novel signal-chain loss and an encoder network that self-programs its outputs to predict DiffMoogs parameters based on the user-defined modular architecture. Moreover, we provide insights and lessons learned towards sound matching using differentiable synthesis. Combining robust sound capabilities with a holistic platform, DiffMoog stands as a premier asset for expediting research in audio synthesis and machine learning.
