Notochord: a Flexible Probabilistic Model for Real-Time MIDI Performance
Victor Shepardson, Jack Armitage, Thor Magnusson
TL;DR
Notochord tackles the challenge of real-time musical performance by introducing a low-latency deep probabilistic model for MIDI sequences. It employs an order-agnostic, autoregressive, event-based representation with a GRU backbone and sub-event conditioning, enabling fine-grained interventions while preserving responsiveness. Trained on the large Lakh MIDI Dataset with robust data augmentation, Notochord achieves inference latencies around $6\,\mathrm{ms}$ per event and sampling latencies around $3\,\mathrm{ms}$, supporting steerable generation, auto-pitch, harmonization, live coding, and other interactive tasks. The work combines a rigorous probabilistic formulation with practical real-time capabilities, and provides open-source code and examples to foster experimentation in embodied musical AI.
Abstract
Deep learning-based probabilistic models of musical data are producing increasingly realistic results and promise to enter creative workflows of many kinds. Yet they have been little-studied in a performance setting, where the results of user actions typically ought to feel instantaneous. To enable such study, we designed Notochord, a deep probabilistic model for sequences of structured events, and trained an instance of it on the Lakh MIDI dataset. Our probabilistic formulation allows interpretable interventions at a sub-event level, which enables one model to act as a backbone for diverse interactive musical functions including steerable generation, harmonization, machine improvisation, and likelihood-based interfaces. Notochord can generate polyphonic and multi-track MIDI, and respond to inputs with latency below ten milliseconds. Training code, model checkpoints and interactive examples are provided as open source software.
