Online Symbolic Music Alignment with Offline Reinforcement Learning

Silvan David Peter

Online Symbolic Music Alignment with Offline Reinforcement Learning

Silvan David Peter

TL;DR

The paper tackles online symbolic music alignment by proposing an offline-trained, attention-based reinforcement learning agent to predict the current score position from local score and past performance context, enabling real-time note-wise alignment. It pairs this online approach with an offline baseline built on two DTW steps (pitch-sequence DTW followed by onset-time DTW) and demonstrates superior alignment performance on multiple piano datasets compared to state-of-the-art offline methods. The online framework yields two practical score-following strategies—the Greedy Agent Model and the Online Alignment Model—achieving high onset-detection accuracy and strong real-time tracking, approaching offline performance in note-wise alignment while maintaining real-time feasibility. Overall, the work establishes offline RL as a viable, effective paradigm for symbolic music alignment and provides reproducible resources to advance research in this area.

Abstract

Symbolic Music Alignment is the process of matching performed MIDI notes to corresponding score notes. In this paper, we introduce a reinforcement learning (RL)-based online symbolic music alignment technique. The RL agent - an attention-based neural network - iteratively estimates the current score position from local score and performance contexts. For this symbolic alignment task, environment states can be sampled exhaustively and the reward is dense, rendering a formulation as a simplified offline RL problem straightforward. We evaluate the trained agent in three ways. First, in its capacity to identify correct score positions for sampled test contexts; second, as the core technique of a complete algorithm for symbolic online note-wise alignment; and finally, as a real-time symbolic score follower. We further investigate the pitch-based score and performance representations used as the agent's inputs. To this end, we develop a second model, a two-step Dynamic Time Warping (DTW)-based offline alignment algorithm leveraging the same input representation. The proposed model outperforms a state-of-the-art reference model of offline symbolic music alignment.

Online Symbolic Music Alignment with Offline Reinforcement Learning

TL;DR

Abstract

Paper Structure (19 sections, 2 equations, 4 figures, 4 tables)

This paper contains 19 sections, 2 equations, 4 figures, 4 tables.

Introduction
Related Work
Offline Symbolic Music Alignment
Pitch Sequence Warping
Onset Sequence Warping
Offline Model Evaluation
Online Alignment Agent
Alignment as Reinforcement Learning
Simplified Deep Q-learning
Value Function Model
Training
Online Models
Online Evaluation
Agent Evaluation
Online Note-wise Alignment
...and 4 more sections

Figures (4)

Figure 1: First half measure of Chopin Op. 9 No. 2 (bottom score), encoded as pitch set sequence (left) and warped to its performance, encoded as sequence of pitches as played (top). The matrix shows the corresponding pairwise distance (shaded is distance of 1, see equation \ref{['eq:dtw']}), red lines indicate equivalent optimal warping paths.
Figure 2: Setup of the value function model: states are encoded as contiguous token sequence of past performance (red) and current score (blue) contexts. Pitch set embeddings are summed over individual pitch embeddings. The model is set up as token classifier as each score onset in the context corresponds to a possible action (= "select this onset as next score onset") and is classified according to its expected reward class. The vector on the right shows the reward probability for each action (pink).
Figure 3: Schematic overview of the Online Alignment Model with a monophonic piece and 8 onset context. Score (blue, top) and performance (red) contexts are inputs to the Value Function Model which outputs value estimates (pink, bottom). The top three onsets (1,2,3) are passed to a tempo extrapolator, along with existing alignments (yellow). The tempo extrapolator predicts three onsets ($\hat{1},\hat{2}$) for the the candidate onsets. The one with lowest distance ($\Delta_1$) to the newest performance note (*) is aligned (pink).
Figure :

Online Symbolic Music Alignment with Offline Reinforcement Learning

TL;DR

Abstract

Online Symbolic Music Alignment with Offline Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)