Table of Contents
Fetching ...

Modulation Discovery with Differentiable Digital Signal Processing

Christopher Mitcheltree, Hao Hao Tan, Joshua D. Reiss

TL;DR

This work addresses the challenge of extracting interpretable, time-varying modulation signals from audio produced by differentiable synthesizers. It introduces a three-stage modulation-discovery framework that combines modulation routing through a differentiable Mod.Synth, self-supervised modulation extraction via LFO-net, and differentiable modulation parameterizations (Frame, LPF, Spline) to recover human-readable modulation curves. Through experiments on synthetic and real-world audio, the study demonstrates a trade-off between interpretability and sound-matching accuracy, with LPF generally providing the best balance and Spline offering higher interpretability, while Frame offers flexibility at the cost of readability. The framework is released with code and VST plugins, enabling practical analysis and reproduction of complex modulations in music production and related domains.

Abstract

Modulations are a critical part of sound design and music production, enabling the creation of complex and evolving audio. Modern synthesizers provide envelopes, low frequency oscillators (LFOs), and more parameter automation tools that allow users to modulate the output with ease. However, determining the modulation signals used to create a sound is difficult, and existing sound-matching / parameter estimation systems are often uninterpretable black boxes or predict high-dimensional framewise parameter values without considering the shape, structure, and routing of the underlying modulation curves. We propose a neural sound-matching approach that leverages modulation extraction, constrained control signal parameterizations, and differentiable digital signal processing (DDSP) to discover the modulations present in a sound. We demonstrate the effectiveness of our approach on highly modulated synthetic and real audio samples, its applicability to different DDSP synth architectures, and investigate the trade-off it incurs between interpretability and sound-matching accuracy. We make our code and audio samples available and provide the trained DDSP synths in a VST plugin.

Modulation Discovery with Differentiable Digital Signal Processing

TL;DR

This work addresses the challenge of extracting interpretable, time-varying modulation signals from audio produced by differentiable synthesizers. It introduces a three-stage modulation-discovery framework that combines modulation routing through a differentiable Mod.Synth, self-supervised modulation extraction via LFO-net, and differentiable modulation parameterizations (Frame, LPF, Spline) to recover human-readable modulation curves. Through experiments on synthetic and real-world audio, the study demonstrates a trade-off between interpretability and sound-matching accuracy, with LPF generally providing the best balance and Spline offering higher interpretability, while Frame offers flexibility at the cost of readability. The framework is released with code and VST plugins, enabling practical analysis and reproduction of complex modulations in music production and related domains.

Abstract

Modulations are a critical part of sound design and music production, enabling the creation of complex and evolving audio. Modern synthesizers provide envelopes, low frequency oscillators (LFOs), and more parameter automation tools that allow users to modulate the output with ease. However, determining the modulation signals used to create a sound is difficult, and existing sound-matching / parameter estimation systems are often uninterpretable black boxes or predict high-dimensional framewise parameter values without considering the shape, structure, and routing of the underlying modulation curves. We propose a neural sound-matching approach that leverages modulation extraction, constrained control signal parameterizations, and differentiable digital signal processing (DDSP) to discover the modulations present in a sound. We demonstrate the effectiveness of our approach on highly modulated synthetic and real audio samples, its applicability to different DDSP synth architectures, and investigate the trade-off it incurs between interpretability and sound-matching accuracy. We make our code and audio samples available and provide the trained DDSP synths in a VST plugin.

Paper Structure

This paper contains 10 sections, 2 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Overview of the modulation discovery process through modulation extraction, parameterization, and routing using a DDSP synth. Orange blocks are neural networks, dashed blocks are optional, and blue blocks are differentiable and may contain learnable weights for sound matching.
  • Figure 2: A 2D drawable modulation grid in the Vital soft synth.
  • Figure 3: Modulation extraction with the framewise, low-pass filtered, and piecewise Bézier curve parameterizations. The ground truth signal is dashed and the four mod. signal distance measures from Section \ref{['ssec:modulation_extraction']} are computed.
  • Figure 4: Discovered additive (red), subtractive (blue), and envelope (orange) modulations in Serum test dataset audio samples using different DDSP synths and Frame (left), LPF (center), and Spline (right) mod. signal parameterizations.