Table of Contents
Fetching ...

Score-Informed Transformer for Refining MIDI Velocity in Automatic Music Transcription

Zhanhong He, Roberto Togneri, David Huang

TL;DR

A modular, lightweight score-informed Transformer correction module that refines the velocity estimates of Automatic Music Transcription (AMT) systems, and outperforms existing score-informed methods and velocity-enabled AMT systems while adding only 1 M parameters.

Abstract

MIDI velocity is crucial for capturing expressive dynamics in human performances. In practical scenarios, a music score with inaccurate velocities may be available alongside the performance audio (e.g., music education and free online archives), enabling the task of score-informed MIDI velocity estimation. In this work, we propose a modular, lightweight score-informed Transformer correction module that refines the velocity estimates of Automatic Music Transcription (AMT) systems. We integrate the proposed module into multiple AMT systems (HPT, HPPNet, and DynEst). Trained exclusively on the MAESTRO training split, our method consistently reduces velocity estimation errors on MAESTRO and improves cross-dataset generalization to SMD and MAPS datasets. Under this training protocol, integrating our score-informed module with HPT (named Score-HPT) establishes a new state-of-the-art performance, outperforms existing score-informed methods and velocity-enabled AMT systems while adding only 1 M parameters.

Score-Informed Transformer for Refining MIDI Velocity in Automatic Music Transcription

TL;DR

A modular, lightweight score-informed Transformer correction module that refines the velocity estimates of Automatic Music Transcription (AMT) systems, and outperforms existing score-informed methods and velocity-enabled AMT systems while adding only 1 M parameters.

Abstract

MIDI velocity is crucial for capturing expressive dynamics in human performances. In practical scenarios, a music score with inaccurate velocities may be available alongside the performance audio (e.g., music education and free online archives), enabling the task of score-informed MIDI velocity estimation. In this work, we propose a modular, lightweight score-informed Transformer correction module that refines the velocity estimates of Automatic Music Transcription (AMT) systems. We integrate the proposed module into multiple AMT systems (HPT, HPPNet, and DynEst). Trained exclusively on the MAESTRO training split, our method consistently reduces velocity estimation errors on MAESTRO and improves cross-dataset generalization to SMD and MAPS datasets. Under this training protocol, integrating our score-informed module with HPT (named Score-HPT) establishes a new state-of-the-art performance, outperforms existing score-informed methods and velocity-enabled AMT systems while adding only 1 M parameters.

Paper Structure

This paper contains 17 sections, 6 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Comparison between the proposed and previous score-informed MIDI velocity estimation approaches.
  • Figure 2: Proposed score-informed MIDI velocity estimation framework. The system includes a velocity estimation module, which is identical to the velocity branch of the chosen AMT baseline. The correction module then rectifies the preliminary velocity estimates using features extracted from the MIDI score.
  • Figure 3: Proposed Conformer-like Transformer encoder.
  • Figure 4: Velocity estimates from the baseline HPT and the proposed Score-HPT. The refinement effect is evident, and both methods provide a fully-populated velocity map.
  • Figure 5: Comparison between HPT and Score-HPT variants using different score features. The training ran 120k iterations, and metrics were evaluated every 10k iterations. Models were trained exclusively on the MAESTRO train set.