Score-Informed Transformer for Refining MIDI Velocity in Automatic Music Transcription
Zhanhong He, Roberto Togneri, David Huang
TL;DR
A modular, lightweight score-informed Transformer correction module that refines the velocity estimates of Automatic Music Transcription (AMT) systems, and outperforms existing score-informed methods and velocity-enabled AMT systems while adding only 1 M parameters.
Abstract
MIDI velocity is crucial for capturing expressive dynamics in human performances. In practical scenarios, a music score with inaccurate velocities may be available alongside the performance audio (e.g., music education and free online archives), enabling the task of score-informed MIDI velocity estimation. In this work, we propose a modular, lightweight score-informed Transformer correction module that refines the velocity estimates of Automatic Music Transcription (AMT) systems. We integrate the proposed module into multiple AMT systems (HPT, HPPNet, and DynEst). Trained exclusively on the MAESTRO training split, our method consistently reduces velocity estimation errors on MAESTRO and improves cross-dataset generalization to SMD and MAPS datasets. Under this training protocol, integrating our score-informed module with HPT (named Score-HPT) establishes a new state-of-the-art performance, outperforms existing score-informed methods and velocity-enabled AMT systems while adding only 1 M parameters.
