The Renaissance of Expert Systems: Optical Recognition of Printed Chinese Jianpu Musical Scores with Lyrics
Fan Bu, Rongfeng Li, Zijin Li, Ya Li, Linfeng Fan, Pei Huang
TL;DR
The paper tackles the underexplored problem of optical recognition for Chinese Jianpu scores with lyrics. It introduces a modular expert-system pipeline that leverages traditional computer-vision priors and unsupervised deep-learning embeddings to convert printed Jianpu scores into MusicXML and MIDI without requiring large labeled datasets. The approach achieves high accuracy on melody (note-wise F1 0.951) and lyrics (character-wise F1 0.931) while digitizing over 5,000 songs (melody-only) and a curated lyric-equipped subset of 1,400 songs, validated on The Anthology of Chinese Folk Songs. By delivering both large-scale datasets and a practical recognition framework, the work provides a data-efficient benchmark and a foundation for downstream music-AI applications such as MusicBERT.
Abstract
Large-scale optical music recognition (OMR) research has focused mainly on Western staff notation, leaving Chinese Jianpu (numbered notation) and its rich lyric resources underexplored. We present a modular expert-system pipeline that converts printed Jianpu scores with lyrics into machine-readable MusicXML and MIDI, without requiring massive annotated training data. Our approach adopts a top-down expert-system design, leveraging traditional computer-vision techniques (e.g., phrase correlation, skeleton analysis) to capitalize on prior knowledge, while integrating unsupervised deep-learning modules for image feature embeddings. This hybrid strategy strikes a balance between interpretability and accuracy. Evaluated on The Anthology of Chinese Folk Songs, our system massively digitizes (i) a melody-only collection of more than 5,000 songs (> 300,000 notes) and (ii) a curated subset with lyrics comprising over 1,400 songs (> 100,000 notes). The system achieves high-precision recognition on both melody (note-wise F1 = 0.951) and aligned lyrics (character-wise F1 = 0.931).
