Table of Contents
Fetching ...

Composer's Assistant 2: Interactive Multi-Track MIDI Infilling with Fine-Grained User Control

Martin E. Malandro

TL;DR

Composer's Assistant 2 delivers a comprehensive, DAW-integrated framework for interactive multi-track MIDI infilling by introducing extensive rhythmic, density, and pitch controls implemented via a Transformer backbone. The system employs a token-based control language, including DNOC and explicit pitch-range and rhythmic conditioning, enabling fine-grained user steering within a REAPER workflow. Objective metrics and token-understanding analyses show substantial gains over prior CA baselines, while a subjective listening study suggests co-created outputs can rival human-composed pieces under proper use. The work additionally provides an open release of the system and source code, signaling a practical path toward deployable, steerable generative music tools in professional creative settings.

Abstract

We introduce Composer's Assistant 2, a system for interactive human-computer composition in the REAPER digital audio workstation. Our work upgrades the Composer's Assistant system (which performs multi-track infilling of symbolic music at the track-measure level) with a wide range of new controls to give users fine-grained control over the system's outputs. Controls introduced in this work include two types of rhythmic conditioning controls, horizontal and vertical note onset density controls, several types of pitch controls, and a rhythmic interest control. We train a T5-like transformer model to implement these controls and to serve as the backbone of our system. With these controls, we achieve a dramatic improvement in objective metrics over the original system. We also study how well our model understands the meaning of our controls, and we conduct a listening study that does not find a significant difference between real music and music composed in a co-creative fashion with our system. We release our complete system, consisting of source code, pretrained models, and REAPER scripts.

Composer's Assistant 2: Interactive Multi-Track MIDI Infilling with Fine-Grained User Control

TL;DR

Composer's Assistant 2 delivers a comprehensive, DAW-integrated framework for interactive multi-track MIDI infilling by introducing extensive rhythmic, density, and pitch controls implemented via a Transformer backbone. The system employs a token-based control language, including DNOC and explicit pitch-range and rhythmic conditioning, enabling fine-grained user steering within a REAPER workflow. Objective metrics and token-understanding analyses show substantial gains over prior CA baselines, while a subjective listening study suggests co-created outputs can rival human-composed pieces under proper use. The work additionally provides an open release of the system and source code, signaling a practical path toward deployable, steerable generative music tools in professional creative settings.

Abstract

We introduce Composer's Assistant 2, a system for interactive human-computer composition in the REAPER digital audio workstation. Our work upgrades the Composer's Assistant system (which performs multi-track infilling of symbolic music at the track-measure level) with a wide range of new controls to give users fine-grained control over the system's outputs. Controls introduced in this work include two types of rhythmic conditioning controls, horizontal and vertical note onset density controls, several types of pitch controls, and a rhythmic interest control. We train a T5-like transformer model to implement these controls and to serve as the backbone of our system. With these controls, we achieve a dramatic improvement in objective metrics over the original system. We also study how well our model understands the meaning of our controls, and we conduct a listening study that does not find a significant difference between real music and music composed in a co-creative fashion with our system. We release our complete system, consisting of source code, pretrained models, and REAPER scripts.
Paper Structure (14 sections, 1 equation, 7 figures, 2 tables)

This paper contains 14 sections, 1 equation, 7 figures, 2 tables.

Figures (7)

  • Figure 1: A 4-measure prompt in REAPER, followed by a model output. Users place empty MIDI items in REAPER to tell the model in which measures to write notes, and track names to tell the model what instrument is on each track. A track-measure in the prompt is boxed.
  • Figure 2: A prompt with 1D rhythmic conditioning in REAPER, followed by a model output. Users draw the rhythms they want in the selected MIDI items, and the model chooses pitches for these rhythms that fit with the rest of the prompt. Unselected MIDI items are included in the prompt to the encoder, and remain unchanged.
  • Figure 3: Six examples of rhythmic interest levels. Points mark note onsets in a 4/4 measure.
  • Figure 4: Horizontal density distributions.
  • Figure 5: The observed probability of success for each control token in our empty prompting test.
  • ...and 2 more figures