Engraving Oriented Joint Estimation of Pitch Spelling and Local and Global Keys
Augustin Bouquillard, Florent Jacquemard
TL;DR
This paper tackles the intertwined problems of pitch spelling (PS) and key estimation (KE) from MIDI data with explicit measure boundaries. It introduces a dynamic-programming–based framework that jointly optimizes spellings and both global and local keys by minimizing the number of printed accidentals, using a two-stage procedure and measure-wise local analyses to refine spellings. The approach yields high accuracy across diverse datasets (e.g., PS up to $99.5\%$ and KE around $93\%$ on average), and includes a faster deterministic variant PS13b suitable for real-time or scalable transcription workflows. The work demonstrates practical viability for music notation processing and offers avenues for extending to richer tonal models and other genres.
Abstract
We revisit the problems of pitch spelling and tonality guessing with a new algorithm for their joint estimation from a MIDI file including information about the measure boundaries. Our algorithm does not only identify a global key but also local ones all along the analyzed piece. It uses Dynamic Programming techniques to search for an optimal spelling in term, roughly, of the number of accidental symbols that would be displayed in the engraved score. The evaluation of this number is coupled with an estimation of the global key and some local keys, one for each measure. Each of the three informations is used for the estimation of the other, in a multi-steps procedure. An evaluation conducted on a monophonic and a piano dataset, comprising 216 464 notes in total, shows a high degree of accuracy, both for pitch spelling (99.5% on average on the Bach corpus and 98.2% on the whole dataset) and global key signature estimation (93.0% on average, 95.58% on the piano dataset). Designed originally as a backend tool in a music transcription framework, this method should also be useful in other tasks related to music notation processing.
