Table of Contents
Fetching ...

DiffMoog: a Differentiable Modular Synthesizer for Sound Matching

Noy Uzrad, Oren Barkan, Almog Elharar, Shlomi Shvartzman, Moshe Laufer, Lior Wolf, Noam Koenigstein

TL;DR

This work tackles sound matching with differentiable synthesis by introducing DiffMoog, a modular differentiable synthesizer that retains traditional modules (oscillators, LFOs, filters, envelopes) and adds FM/AM capabilities with flexible routing. It pairs DiffMoog with an end-to-end platform where an encoder predicts chain-specific parameters and a novel signal-chain spectral loss guides gradient-based optimization, formalized as $L_{total} = \mathcal{L}_{\mathrm{p}} + \beta \cdot \mathcal{L}_{\mathrm{SC}}$ and $\mathcal{L}_{\mathrm{p}} = \sum_{n \in \mathcal{N}} L_{\text{reg}}(p_n, \hat{p}_n) + \sum_{m\in \mathcal{M}} L_{\text{cat}}(c_m, \hat{c}_m)$. The authors release an open-source DiffMoog and demonstrate insights on training dynamics, including the benefit of Wasserstein-based spectral losses for frequency estimation and the value of out-of-domain data, while acknowledging remaining challenges with complex FM chains. The platform enables researchers to perform end-to-end audio optimization with interpretable, modular synths, potentially accelerating AI-assisted sound design and synthesis research.

Abstract

This paper presents DiffMoog - a differentiable modular synthesizer with a comprehensive set of modules typically found in commercial instruments. Being differentiable, it allows integration into neural networks, enabling automated sound matching, to replicate a given audio input. Notably, DiffMoog facilitates modulation capabilities (FM/AM), low-frequency oscillators (LFOs), filters, envelope shapers, and the ability for users to create custom signal chains. We introduce an open-source platform that comprises DiffMoog and an end-to-end sound matching framework. This framework utilizes a novel signal-chain loss and an encoder network that self-programs its outputs to predict DiffMoogs parameters based on the user-defined modular architecture. Moreover, we provide insights and lessons learned towards sound matching using differentiable synthesis. Combining robust sound capabilities with a holistic platform, DiffMoog stands as a premier asset for expediting research in audio synthesis and machine learning.

DiffMoog: a Differentiable Modular Synthesizer for Sound Matching

TL;DR

This work tackles sound matching with differentiable synthesis by introducing DiffMoog, a modular differentiable synthesizer that retains traditional modules (oscillators, LFOs, filters, envelopes) and adds FM/AM capabilities with flexible routing. It pairs DiffMoog with an end-to-end platform where an encoder predicts chain-specific parameters and a novel signal-chain spectral loss guides gradient-based optimization, formalized as and . The authors release an open-source DiffMoog and demonstrate insights on training dynamics, including the benefit of Wasserstein-based spectral losses for frequency estimation and the value of out-of-domain data, while acknowledging remaining challenges with complex FM chains. The platform enables researchers to perform end-to-end audio optimization with interpretable, modular synths, potentially accelerating AI-assisted sound design and synthesis research.

Abstract

This paper presents DiffMoog - a differentiable modular synthesizer with a comprehensive set of modules typically found in commercial instruments. Being differentiable, it allows integration into neural networks, enabling automated sound matching, to replicate a given audio input. Notably, DiffMoog facilitates modulation capabilities (FM/AM), low-frequency oscillators (LFOs), filters, envelope shapers, and the ability for users to create custom signal chains. We introduce an open-source platform that comprises DiffMoog and an end-to-end sound matching framework. This framework utilizes a novel signal-chain loss and an encoder network that self-programs its outputs to predict DiffMoogs parameters based on the user-defined modular architecture. Moreover, we provide insights and lessons learned towards sound matching using differentiable synthesis. Combining robust sound capabilities with a holistic platform, DiffMoog stands as a premier asset for expediting research in audio synthesis and machine learning.
Paper Structure (6 sections, 4 equations, 7 figures, 1 table)

This paper contains 6 sections, 4 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: The DiffMoog synth with an arbitrary chain. Shown: matrix, cells, modules, connections (arrows) and parameters ('p'). Black arrows are fixed connections, gray arrows are optional connections. Empty cell outputs the 0 signal.
  • Figure 2: Spectrograms sounds synthesized with DiffMoog. While typical, FM/AM sounds cannot be synthesized by prior differentiable synths.
  • Figure 3: The end-to-end sound matching system diagram.
  • Figure 4: The neural network architecture with dynamically allocated MLP heads
  • Figure 5: The chain used for the experiment in Fig. \ref{['fig:experiment']}, with a sawtooth oscillator, square oscillator, Amplitude ADSR and a Lowpass Filter.
  • ...and 2 more figures