TheGlueNote: Learned Representations for Robust and Flexible Note Alignment
Silvan David Peter, Gerhard Widmer
TL;DR
This work tackles robust symbolic note alignment between two versions of a MIDI piece, focusing on large mismatches such as repeats, skips, and ornamentations. It introduces TheGlueNote, a transformer-based encoder that learns note-wise representations from 512-note windows and outputs a 513-note similarity matrix used to identify matches, with three post-processing options including DTW. Trained on synthetically augmented MIDI data, TheGlueNote achieves competitive state-of-the-art performance and shows strong robustness to mismatches while operating directly on plain MIDI without requiring quantization or score annotations. Ablation studies highlight the effectiveness of DTW-based post-processing with learned representations, offering favorable runtime. This approach advances robust symbolic note alignment and opens avenues for end-to-end and cross-domain extensions.
Abstract
Note alignment refers to the task of matching individual notes of two versions of the same symbolically encoded piece. Methods addressing this task commonly rely on sequence alignment algorithms such as Hidden Markov Models or Dynamic Time Warping (DTW) applied directly to note or onset sequences. While successful in many cases, such methods struggle with large mismatches between the versions. In this work, we learn note-wise representations from data augmented with various complex mismatch cases, e.g. repeats, skips, block insertions, and long trills. At the heart of our approach lies a transformer encoder network - TheGlueNote - which predicts pairwise note similarities for two 512 note subsequences. We postprocess the predicted similarities using flavors of weightedDTW and pitch-separated onsetDTW to retrieve note matches for two sequences of arbitrary length. Our approach performs on par with the state of the art in terms of note alignment accuracy, is considerably more robust to version mismatches, and works directly on any pair of MIDI files.
