Table of Contents
Fetching ...

The Renaissance of Expert Systems: Optical Recognition of Printed Chinese Jianpu Musical Scores with Lyrics

Fan Bu, Rongfeng Li, Zijin Li, Ya Li, Linfeng Fan, Pei Huang

TL;DR

The paper tackles the underexplored problem of optical recognition for Chinese Jianpu scores with lyrics. It introduces a modular expert-system pipeline that leverages traditional computer-vision priors and unsupervised deep-learning embeddings to convert printed Jianpu scores into MusicXML and MIDI without requiring large labeled datasets. The approach achieves high accuracy on melody (note-wise F1 0.951) and lyrics (character-wise F1 0.931) while digitizing over 5,000 songs (melody-only) and a curated lyric-equipped subset of 1,400 songs, validated on The Anthology of Chinese Folk Songs. By delivering both large-scale datasets and a practical recognition framework, the work provides a data-efficient benchmark and a foundation for downstream music-AI applications such as MusicBERT.

Abstract

Large-scale optical music recognition (OMR) research has focused mainly on Western staff notation, leaving Chinese Jianpu (numbered notation) and its rich lyric resources underexplored. We present a modular expert-system pipeline that converts printed Jianpu scores with lyrics into machine-readable MusicXML and MIDI, without requiring massive annotated training data. Our approach adopts a top-down expert-system design, leveraging traditional computer-vision techniques (e.g., phrase correlation, skeleton analysis) to capitalize on prior knowledge, while integrating unsupervised deep-learning modules for image feature embeddings. This hybrid strategy strikes a balance between interpretability and accuracy. Evaluated on The Anthology of Chinese Folk Songs, our system massively digitizes (i) a melody-only collection of more than 5,000 songs (> 300,000 notes) and (ii) a curated subset with lyrics comprising over 1,400 songs (> 100,000 notes). The system achieves high-precision recognition on both melody (note-wise F1 = 0.951) and aligned lyrics (character-wise F1 = 0.931).

The Renaissance of Expert Systems: Optical Recognition of Printed Chinese Jianpu Musical Scores with Lyrics

TL;DR

The paper tackles the underexplored problem of optical recognition for Chinese Jianpu scores with lyrics. It introduces a modular expert-system pipeline that leverages traditional computer-vision priors and unsupervised deep-learning embeddings to convert printed Jianpu scores into MusicXML and MIDI without requiring large labeled datasets. The approach achieves high accuracy on melody (note-wise F1 0.951) and lyrics (character-wise F1 0.931) while digitizing over 5,000 songs (melody-only) and a curated lyric-equipped subset of 1,400 songs, validated on The Anthology of Chinese Folk Songs. By delivering both large-scale datasets and a practical recognition framework, the work provides a data-efficient benchmark and a foundation for downstream music-AI applications such as MusicBERT.

Abstract

Large-scale optical music recognition (OMR) research has focused mainly on Western staff notation, leaving Chinese Jianpu (numbered notation) and its rich lyric resources underexplored. We present a modular expert-system pipeline that converts printed Jianpu scores with lyrics into machine-readable MusicXML and MIDI, without requiring massive annotated training data. Our approach adopts a top-down expert-system design, leveraging traditional computer-vision techniques (e.g., phrase correlation, skeleton analysis) to capitalize on prior knowledge, while integrating unsupervised deep-learning modules for image feature embeddings. This hybrid strategy strikes a balance between interpretability and accuracy. Evaluated on The Anthology of Chinese Folk Songs, our system massively digitizes (i) a melody-only collection of more than 5,000 songs (> 300,000 notes) and (ii) a curated subset with lyrics comprising over 1,400 songs (> 100,000 notes). The system achieves high-precision recognition on both melody (note-wise F1 = 0.951) and aligned lyrics (character-wise F1 = 0.931).

Paper Structure

This paper contains 22 sections, 9 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: A phrase of Jianpu and the equivalent staff notation.
  • Figure 2: A diagram of our Jianpu OMR pipeline.
  • Figure 3: Examples of the adaptive lighting correction process using dual gamma transform.
  • Figure 4: Demonstration of entropy-minimization rotation correction.
  • Figure 5: (a) An example digit image; (b) LoG-filtered; (c) manually enhanced templates.
  • ...and 7 more figures