Table of Contents
Fetching ...

Notochord: a Flexible Probabilistic Model for Real-Time MIDI Performance

Victor Shepardson, Jack Armitage, Thor Magnusson

TL;DR

Notochord tackles the challenge of real-time musical performance by introducing a low-latency deep probabilistic model for MIDI sequences. It employs an order-agnostic, autoregressive, event-based representation with a GRU backbone and sub-event conditioning, enabling fine-grained interventions while preserving responsiveness. Trained on the large Lakh MIDI Dataset with robust data augmentation, Notochord achieves inference latencies around $6\,\mathrm{ms}$ per event and sampling latencies around $3\,\mathrm{ms}$, supporting steerable generation, auto-pitch, harmonization, live coding, and other interactive tasks. The work combines a rigorous probabilistic formulation with practical real-time capabilities, and provides open-source code and examples to foster experimentation in embodied musical AI.

Abstract

Deep learning-based probabilistic models of musical data are producing increasingly realistic results and promise to enter creative workflows of many kinds. Yet they have been little-studied in a performance setting, where the results of user actions typically ought to feel instantaneous. To enable such study, we designed Notochord, a deep probabilistic model for sequences of structured events, and trained an instance of it on the Lakh MIDI dataset. Our probabilistic formulation allows interpretable interventions at a sub-event level, which enables one model to act as a backbone for diverse interactive musical functions including steerable generation, harmonization, machine improvisation, and likelihood-based interfaces. Notochord can generate polyphonic and multi-track MIDI, and respond to inputs with latency below ten milliseconds. Training code, model checkpoints and interactive examples are provided as open source software.

Notochord: a Flexible Probabilistic Model for Real-Time MIDI Performance

TL;DR

Notochord tackles the challenge of real-time musical performance by introducing a low-latency deep probabilistic model for MIDI sequences. It employs an order-agnostic, autoregressive, event-based representation with a GRU backbone and sub-event conditioning, enabling fine-grained interventions while preserving responsiveness. Trained on the large Lakh MIDI Dataset with robust data augmentation, Notochord achieves inference latencies around per event and sampling latencies around , supporting steerable generation, auto-pitch, harmonization, live coding, and other interactive tasks. The work combines a rigorous probabilistic formulation with practical real-time capabilities, and provides open-source code and examples to foster experimentation in embodied musical AI.

Abstract

Deep learning-based probabilistic models of musical data are producing increasingly realistic results and promise to enter creative workflows of many kinds. Yet they have been little-studied in a performance setting, where the results of user actions typically ought to feel instantaneous. To enable such study, we designed Notochord, a deep probabilistic model for sequences of structured events, and trained an instance of it on the Lakh MIDI dataset. Our probabilistic formulation allows interpretable interventions at a sub-event level, which enables one model to act as a backbone for diverse interactive musical functions including steerable generation, harmonization, machine improvisation, and likelihood-based interfaces. Notochord can generate polyphonic and multi-track MIDI, and respond to inputs with latency below ten milliseconds. Training code, model checkpoints and interactive examples are provided as open source software.
Paper Structure (21 sections, 1 equation, 6 figures)

This paper contains 21 sections, 1 equation, 6 figures.

Figures (6)

  • Figure 1: Architecture of the Notochord model at training time. Rectangular blocks are functions, long capsules are embedding vectors, and short capsules are hidden states. Each sub-event depends on previous events via a GRU, and also on a random subset of the other sub-events. Conditioning of each sub-event on other sub-events is achieved by simply adding their embeddings to the hidden state after passing it through an MLP $f_h$. The addition can be implemented in parallel as a batched matrix multiplication at training time. This is depicted with black cells indicating a one, gray cells a random binary value as proposed in Section \ref{['order_agnostic']}, and white cells a zero. A final MLP per sub-event maps the summed embeddings and hidden states to distribution parameters. MLP architecture is shown as an inset, top right.
  • Figure 2: Bootstrap 99% confidence intervals for negative log likelihoods (NLL) computed over the validation set (lower is better). On the left, NLL is broken out by sub-event modality (instrument, pitch, time, velocity) and by which other sub-events each is conditioned on. In the leftmost position of each subplot, the sub-event is conditioned only on previous events via hidden state (S) and then from left to right on larger combinations of other sub-events. On the right, total NLL per event is reported for every permutation of sub-event order.
  • Figure 3: A sequence of conditional distributions (Section \ref{['sub-events']}) from sampling the model. Sub-events are ordered from top to bottom, then events left to right; red lines indicate sampled values. In this example, the discrete distribution over instrument (orange, top left) is sampled first, then pitch (pink), then the mixture density over time (green), and velocity (blue). Sampling continues in the right column, beginning again with instrument for the second event. Note how the initially higher entropy of the instrument distribution (top left) collapses to a very high probability of sampling the same instrument again (top right); and how the velocity value sampled first (bottom left) becomes a more likely value for the second sample (bottom right)
  • Figure 4: Piano-roll visualization of event streams generated by sampling Notochord. We encourage diversity by sampling the instrument of the first event uniformly from the General MIDI instruments instead of using the model prior, which like the LMD is heavily biased toward instrument 1 (see Figure \ref{['fig:distributions']}).
  • Figure 5: Implementation of the neural harmonizer. Input events from a MIDI controller are in the bottom row. At the top is a sequence of model states annotated with queries for each harmonizing pitch. The combined stream of events from the player, Notochord, and the scheduler appear in the middle. In this example, the player strikes two notes before releasing each of them. The scheduler tracks which harmonizing pitches are associated with which performed pitches in order to generate matching note-offs.
  • ...and 1 more figures