SYMPLEX: Controllable Symbolic Music Generation using Simplex Diffusion with Vocabulary Priors
Nicolas Jonason, Luca Casini, Bob L. T. Sturm
TL;DR
SYMPLEX addresses fast, controllable symbolic music generation by introducing simplex diffusion operating on probability distributions $p_t$ over an unordered note-event vocabulary. The framework trains a denoising network to recover clean distributions from noisy inputs, and performs iterative inference to generate 4-bar multi-instrument MIDI loops, with controllability achieved via vocabulary priors $p_v$ applied during decoding. Key contributions include the first application of simplex diffusion to symbolic music, a method to steer generation through input priors without task-specific fine-tuning, and an extended loop extraction pipeline that uses metrical-structure information to build a large MIDI-loop dataset. The approach combines a transformer-based encoder with an orderless representation to enable tasks like infill, variations, and instrument/pitch constraints, offering plug-and-play guidance and efficient inference for programmable music generation.
Abstract
We present a new approach for fast and controllable generation of symbolic music based on the simplex diffusion, which is essentially a diffusion process operating on probabilities rather than the signal space. This objective has been applied in domains such as natural language processing but here we apply it to generating 4-bar multi-instrument music loops using an orderless representation. We show that our model can be steered with vocabulary priors, which affords a considerable level control over the music generation process, for instance, infilling in time and pitch and choice of instrumentation -- all without task-specific model adaptation or applying extrinsic control.
