Table of Contents
Fetching ...

Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion

Yujia Huang, Adishree Ghatare, Yuanzhe Liu, Ziniu Hu, Qinsheng Zhang, Chandramouli S Sastry, Siddharth Gururani, Sageev Oore, Yisong Yue

TL;DR

This work tackles controllable symbolic music generation under non-differentiable rules by introducing Stochastic Control Guidance (SCG), a plug-and-play mechanism that guides pre-trained diffusion models using forward rule evaluations rather than gradients. It couples SCG with a latent diffusion architecture built on a VAE-encoded piano-roll representation and a transformer-based diffusion backbone to achieve high time resolution (10 ms) suitable for musical expressivity. Theoretical grounding in stochastic optimal control and path integral methods provides a principled route to steer diffusion samples toward rule-compliant outputs, while practical algorithms enable efficient, training-free guidance even for black-box rules. Empirical results across unconditional and rule-guided tasks demonstrate superior music quality and controllability, with subjective evaluations confirming perceptual improvements in alignment and creativity, and show potential for composers to use the system as a compositional tool.

Abstract

We study the problem of symbolic music generation (e.g., generating piano rolls), with a technical focus on non-differentiable rule guidance. Musical rules are often expressed in symbolic form on note characteristics, such as note density or chord progression, many of which are non-differentiable which pose a challenge when using them for guided diffusion. We propose Stochastic Control Guidance (SCG), a novel guidance method that only requires forward evaluation of rule functions that can work with pre-trained diffusion models in a plug-and-play way, thus achieving training-free guidance for non-differentiable rules for the first time. Additionally, we introduce a latent diffusion architecture for symbolic music generation with high time resolution, which can be composed with SCG in a plug-and-play fashion. Compared to standard strong baselines in symbolic music generation, this framework demonstrates marked advancements in music quality and rule-based controllability, outperforming current state-of-the-art generators in a variety of settings. For detailed demonstrations, code and model checkpoints, please visit our project website: https://scg-rule-guided-music.github.io/.

Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion

TL;DR

This work tackles controllable symbolic music generation under non-differentiable rules by introducing Stochastic Control Guidance (SCG), a plug-and-play mechanism that guides pre-trained diffusion models using forward rule evaluations rather than gradients. It couples SCG with a latent diffusion architecture built on a VAE-encoded piano-roll representation and a transformer-based diffusion backbone to achieve high time resolution (10 ms) suitable for musical expressivity. Theoretical grounding in stochastic optimal control and path integral methods provides a principled route to steer diffusion samples toward rule-compliant outputs, while practical algorithms enable efficient, training-free guidance even for black-box rules. Empirical results across unconditional and rule-guided tasks demonstrate superior music quality and controllability, with subjective evaluations confirming perceptual improvements in alignment and creativity, and show potential for composers to use the system as a compositional tool.

Abstract

We study the problem of symbolic music generation (e.g., generating piano rolls), with a technical focus on non-differentiable rule guidance. Musical rules are often expressed in symbolic form on note characteristics, such as note density or chord progression, many of which are non-differentiable which pose a challenge when using them for guided diffusion. We propose Stochastic Control Guidance (SCG), a novel guidance method that only requires forward evaluation of rule functions that can work with pre-trained diffusion models in a plug-and-play way, thus achieving training-free guidance for non-differentiable rules for the first time. Additionally, we introduce a latent diffusion architecture for symbolic music generation with high time resolution, which can be composed with SCG in a plug-and-play fashion. Compared to standard strong baselines in symbolic music generation, this framework demonstrates marked advancements in music quality and rule-based controllability, outperforming current state-of-the-art generators in a variety of settings. For detailed demonstrations, code and model checkpoints, please visit our project website: https://scg-rule-guided-music.github.io/.
Paper Structure (38 sections, 3 theorems, 28 equations, 8 figures, 15 tables, 3 algorithms)

This paper contains 38 sections, 3 theorems, 28 equations, 8 figures, 15 tables, 3 algorithms.

Key Result

Theorem 4.1

Consider the dynamical system in Eq. eq:controlled_sde. For a terminal cost defined as $\phi(\bm{\eta}_{T}) \stackrel{\Delta}{=} \ell_y(\bm{\eta}_{T}) \stackrel{\Delta}{=} - \log p(\mathbf{y} | \bm{\eta}_{T}) + \mathtt{const}$, and initial condition $\bm{\eta}_{0} \sim \mathcal{N}(\mathbf{0}, \mathb

Figures (8)

  • Figure 1: Overview of Stochastic Control Guidance (SCG) for plug-and-play non-differentiable rule guided generation. At each sampling step, we sample several realizations of the next step, and select the one yielding the most rule-compliant clean sample.
  • Figure 2: We use a VAE to encode piano roll segments to latent space and concatenate them for the next stage of diffusion training.
  • Figure 3: Subjective evaluation scores.
  • Figure 4: Training and validation curves of the classifiers trained on various rules.
  • Figure 5: Best loss (a) and loss range (b) over stochastic control guided DDPM sampling on a representative sample with note density as the conditioning rule.
  • ...and 3 more figures

Theorems & Definitions (6)

  • Theorem 4.1: proof in Appendix \ref{['sec:thm1_proof']}
  • Proposition 4.2: proof in Appendix \ref{['sec:thm2_proof']}
  • Lemma 1.1: dai1991stochasticpavon1989stochastic
  • proof
  • proof
  • Remark 1.2