Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion

Yujia Huang; Adishree Ghatare; Yuanzhe Liu; Ziniu Hu; Qinsheng Zhang; Chandramouli S Sastry; Siddharth Gururani; Sageev Oore; Yisong Yue

Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion

Yujia Huang, Adishree Ghatare, Yuanzhe Liu, Ziniu Hu, Qinsheng Zhang, Chandramouli S Sastry, Siddharth Gururani, Sageev Oore, Yisong Yue

TL;DR

This work tackles controllable symbolic music generation under non-differentiable rules by introducing Stochastic Control Guidance (SCG), a plug-and-play mechanism that guides pre-trained diffusion models using forward rule evaluations rather than gradients. It couples SCG with a latent diffusion architecture built on a VAE-encoded piano-roll representation and a transformer-based diffusion backbone to achieve high time resolution (10 ms) suitable for musical expressivity. Theoretical grounding in stochastic optimal control and path integral methods provides a principled route to steer diffusion samples toward rule-compliant outputs, while practical algorithms enable efficient, training-free guidance even for black-box rules. Empirical results across unconditional and rule-guided tasks demonstrate superior music quality and controllability, with subjective evaluations confirming perceptual improvements in alignment and creativity, and show potential for composers to use the system as a compositional tool.

Abstract

We study the problem of symbolic music generation (e.g., generating piano rolls), with a technical focus on non-differentiable rule guidance. Musical rules are often expressed in symbolic form on note characteristics, such as note density or chord progression, many of which are non-differentiable which pose a challenge when using them for guided diffusion. We propose Stochastic Control Guidance (SCG), a novel guidance method that only requires forward evaluation of rule functions that can work with pre-trained diffusion models in a plug-and-play way, thus achieving training-free guidance for non-differentiable rules for the first time. Additionally, we introduce a latent diffusion architecture for symbolic music generation with high time resolution, which can be composed with SCG in a plug-and-play fashion. Compared to standard strong baselines in symbolic music generation, this framework demonstrates marked advancements in music quality and rule-based controllability, outperforming current state-of-the-art generators in a variety of settings. For detailed demonstrations, code and model checkpoints, please visit our project website: https://scg-rule-guided-music.github.io/.

Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion

TL;DR

Abstract

Paper Structure (38 sections, 3 theorems, 28 equations, 8 figures, 15 tables, 3 algorithms)

This paper contains 38 sections, 3 theorems, 28 equations, 8 figures, 15 tables, 3 algorithms.

Introduction
Related Works
Background
Non-Differentiable Rule Guidance
Rule Guidance Problem
Guidance via Stochastic Control
Practical Algorithms
General Theoretical Connection
Latent Diffusion Architecture
Experiments
Experimental Settings
Unconditional Generation
Individual Rule Guidance
Composite Rule Guidance
Ablation Studies
...and 23 more sections

Key Result

Theorem 4.1

Consider the dynamical system in Eq. eq:controlled_sde. For a terminal cost defined as $\phi(\bm{\eta}_{T}) \stackrel{\Delta}{=} \ell_y(\bm{\eta}_{T}) \stackrel{\Delta}{=} - \log p(\mathbf{y} | \bm{\eta}_{T}) + \mathtt{const}$, and initial condition $\bm{\eta}_{0} \sim \mathcal{N}(\mathbf{0}, \mathb

Figures (8)

Figure 1: Overview of Stochastic Control Guidance (SCG) for plug-and-play non-differentiable rule guided generation. At each sampling step, we sample several realizations of the next step, and select the one yielding the most rule-compliant clean sample.
Figure 2: We use a VAE to encode piano roll segments to latent space and concatenate them for the next stage of diffusion training.
Figure 3: Subjective evaluation scores.
Figure 4: Training and validation curves of the classifiers trained on various rules.
Figure 5: Best loss (a) and loss range (b) over stochastic control guided DDPM sampling on a representative sample with note density as the conditioning rule.
...and 3 more figures

Theorems & Definitions (6)

Theorem 4.1: proof in Appendix \ref{['sec:thm1_proof']}
Proposition 4.2: proof in Appendix \ref{['sec:thm2_proof']}
Lemma 1.1: dai1991stochasticpavon1989stochastic
proof
proof
Remark 1.2

Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion

TL;DR

Abstract

Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (6)