Online Symbolic Music Alignment with Offline Reinforcement Learning
Silvan David Peter
TL;DR
The paper tackles online symbolic music alignment by proposing an offline-trained, attention-based reinforcement learning agent to predict the current score position from local score and past performance context, enabling real-time note-wise alignment. It pairs this online approach with an offline baseline built on two DTW steps (pitch-sequence DTW followed by onset-time DTW) and demonstrates superior alignment performance on multiple piano datasets compared to state-of-the-art offline methods. The online framework yields two practical score-following strategies—the Greedy Agent Model and the Online Alignment Model—achieving high onset-detection accuracy and strong real-time tracking, approaching offline performance in note-wise alignment while maintaining real-time feasibility. Overall, the work establishes offline RL as a viable, effective paradigm for symbolic music alignment and provides reproducible resources to advance research in this area.
Abstract
Symbolic Music Alignment is the process of matching performed MIDI notes to corresponding score notes. In this paper, we introduce a reinforcement learning (RL)-based online symbolic music alignment technique. The RL agent - an attention-based neural network - iteratively estimates the current score position from local score and performance contexts. For this symbolic alignment task, environment states can be sampled exhaustively and the reward is dense, rendering a formulation as a simplified offline RL problem straightforward. We evaluate the trained agent in three ways. First, in its capacity to identify correct score positions for sampled test contexts; second, as the core technique of a complete algorithm for symbolic online note-wise alignment; and finally, as a real-time symbolic score follower. We further investigate the pitch-based score and performance representations used as the agent's inputs. To this end, we develop a second model, a two-step Dynamic Time Warping (DTW)-based offline alignment algorithm leveraging the same input representation. The proposed model outperforms a state-of-the-art reference model of offline symbolic music alignment.
