Table of Contents
Fetching ...

Detecting Notational Errors in Digital Music Scores

Géré Léo, Nicolas Audebert, Florent Jacquemard

TL;DR

This work tackles the challenge of notational errors in digital music scores by introducing a two-step, modular error-detection pipeline. The first step enforces rhythm/time consistency within MusicXML, while the second step applies format-agnostic, rule-based contextual validation via a tokenized state machine. Applied to the ASAP piano-score dataset, the approach identified notational errors in about 42% of scores and facilitated manual fixes to improve overall data quality, demonstrating the method's practicality for data curation in music information retrieval. The framework is designed to be extensible to additional formats and rules, offering a foundation for more comprehensivescore validation beyond MusicXML.

Abstract

Music scores are used to precisely store music pieces for transmission and preservation. To represent and manipulate these complex objects, various formats have been tailored for different use cases. While music notation follows specific rules, digital formats usually enforce them leniently. Hence, digital music scores widely vary in quality, due to software and format specificity, conversion issues, and dubious user inputs. Problems range from minor engraving discrepancies to major notation mistakes. Yet, data quality is a major issue when dealing with musical information extraction and retrieval. We present an automated approach to detect notational errors, aiming at precisely localizing defects in scores. We identify two types of errors: i) rhythm/time inconsistencies in the encoding of individual musical elements, and ii) contextual errors, i.e. notation mistakes that break commonly accepted musical rules. We implement the latter using a modular state machine that can be easily extended to include rules representing the usual conventions from the common Western music notation. Finally, we apply this error-detection method to the piano score dataset ASAP. We highlight that around 40% of the scores contain at least one notational error, and manually fix multiple of them to enhance the dataset's quality.

Detecting Notational Errors in Digital Music Scores

TL;DR

This work tackles the challenge of notational errors in digital music scores by introducing a two-step, modular error-detection pipeline. The first step enforces rhythm/time consistency within MusicXML, while the second step applies format-agnostic, rule-based contextual validation via a tokenized state machine. Applied to the ASAP piano-score dataset, the approach identified notational errors in about 42% of scores and facilitated manual fixes to improve overall data quality, demonstrating the method's practicality for data curation in music information retrieval. The framework is designed to be extensible to additional formats and rules, offering a foundation for more comprehensivescore validation beyond MusicXML.

Abstract

Music scores are used to precisely store music pieces for transmission and preservation. To represent and manipulate these complex objects, various formats have been tailored for different use cases. While music notation follows specific rules, digital formats usually enforce them leniently. Hence, digital music scores widely vary in quality, due to software and format specificity, conversion issues, and dubious user inputs. Problems range from minor engraving discrepancies to major notation mistakes. Yet, data quality is a major issue when dealing with musical information extraction and retrieval. We present an automated approach to detect notational errors, aiming at precisely localizing defects in scores. We identify two types of errors: i) rhythm/time inconsistencies in the encoding of individual musical elements, and ii) contextual errors, i.e. notation mistakes that break commonly accepted musical rules. We implement the latter using a modular state machine that can be easily extended to include rules representing the usual conventions from the common Western music notation. Finally, we apply this error-detection method to the piano score dataset ASAP. We highlight that around 40% of the scores contain at least one notational error, and manually fix multiple of them to enhance the dataset's quality.

Paper Structure

This paper contains 29 sections, 16 figures, 2 tables.

Figures (16)

  • Figure 1: Example of the same erroneous MusicXML file (presented in \ref{['apx:error-musicxml']}) rendered by three different score rendering/editing software applications.
  • Figure 2: Example of overlapping notes in measure 356 of 3rd movement of Beethoven's Piano Sonata No. 17 in the ASAP dataset foscarin:asap-dataset. The notes are colored by voice. The notes in the green voice overlap with each other: no other note should start on that voice after the first dotted quarter note. This issue might come from a conversion error in the history of the score, merging two voices into one.
  • Figure 3: Example of tokenization of a score. The timeline represents the sequence of tokens. The items' shape represents the type of token, and their color the voice they belong to (only for tokens tied to a voice).
  • Figure 4: Guard condition examples for a token $T$ for a state $S$, notating $v = S_\text{voices}[T_\text{voice}]$
  • Figure 5: Example of tuplet rounding error in measure 94 of Beethoven's Sonata No. 17, op. 32, 2nd movement. The theoretical duration of a 32nd note in a 14:8 tuplet should be $\frac{1}{8} \times \frac{8}{14} \times 480 = \frac{240}{7} \times 480 \approx 34.3 \neq 34$.
  • ...and 11 more figures