Table of Contents
Fetching ...

Optical Music Recognition in Manuscripts from the Ricordi Archive

Federico Simonetta, Rishav Mondal, Luca Andrea Ludovico, Stavros Ntalampiras

TL;DR

This paper develops an Optical Music Recognition workflow for historical Ricordi manuscripts by creating a large, labeled dataset of handwritten musical symbols, applying staff-line removal and blob detection to extract symbols, and evaluating CNN-based classifiers and AutoML. It demonstrates that CNNs can reliably distinguish musically relevant vs. irrelevant symbols with ~85% balanced accuracy in the binary task, and provides uncertainty-based filtering to improve precision. The work contributes a publicly available dataset of ~198k labeled blobs and a replicable preprocessing and training pipeline, enabling automatic annotation of the remaining Ricordi archive pages. This has practical impact for musicology and digital libraries by accelerating large-scale digitization and searchability of historical scores.

Abstract

The Ricordi archive, a prestigious collection of significant musical manuscripts from renowned opera composers such as Donizetti, Verdi and Puccini, has been digitized. This process has allowed us to automatically extract samples that represent various musical elements depicted on the manuscripts, including notes, staves, clefs, erasures, and composer's annotations, among others. To distinguish between digitization noise and actual music elements, a subset of these images was meticulously grouped and labeled by multiple individuals into several classes. After assessing the consistency of the annotations, we trained multiple neural network-based classifiers to differentiate between the identified music elements. The primary objective of this study was to evaluate the reliability of these classifiers, with the ultimate goal of using them for the automatic categorization of the remaining unannotated data set. The dataset, complemented by manual annotations, models, and source code used in these experiments are publicly accessible for replication purposes.

Optical Music Recognition in Manuscripts from the Ricordi Archive

TL;DR

This paper develops an Optical Music Recognition workflow for historical Ricordi manuscripts by creating a large, labeled dataset of handwritten musical symbols, applying staff-line removal and blob detection to extract symbols, and evaluating CNN-based classifiers and AutoML. It demonstrates that CNNs can reliably distinguish musically relevant vs. irrelevant symbols with ~85% balanced accuracy in the binary task, and provides uncertainty-based filtering to improve precision. The work contributes a publicly available dataset of ~198k labeled blobs and a replicable preprocessing and training pipeline, enabling automatic annotation of the remaining Ricordi archive pages. This has practical impact for musicology and digital libraries by accelerating large-scale digitization and searchability of historical scores.

Abstract

The Ricordi archive, a prestigious collection of significant musical manuscripts from renowned opera composers such as Donizetti, Verdi and Puccini, has been digitized. This process has allowed us to automatically extract samples that represent various musical elements depicted on the manuscripts, including notes, staves, clefs, erasures, and composer's annotations, among others. To distinguish between digitization noise and actual music elements, a subset of these images was meticulously grouped and labeled by multiple individuals into several classes. After assessing the consistency of the annotations, we trained multiple neural network-based classifiers to differentiate between the identified music elements. The primary objective of this study was to evaluate the reliability of these classifiers, with the ultimate goal of using them for the automatic categorization of the remaining unannotated data set. The dataset, complemented by manual annotations, models, and source code used in these experiments are publicly accessible for replication purposes.
Paper Structure (8 sections, 2 equations, 6 figures, 2 tables)

This paper contains 8 sections, 2 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Screenshot of the interface used for annotating the dataset. Texts are in italian.
  • Figure 2: Original distribution of the blob images across the classes before merging the less frequent ones. Note that the Y axis is in log scale.
  • Figure 3: Distribution of the images across the classes, after the merging of the less frequent classes. Note that the Y axis is in log scale.
  • Figure 4: Examples of blobs for each class in the dataset.
  • Figure 5: Trend of the balanced accuracy and percentage of retained test data for various level of confidences for the binary task.
  • ...and 1 more figures