Table of Contents
Fetching ...

Reconstructing the Charlie Parker Omnibook using an audio-to-score automatic transcription pipeline

Xavier Riley, Simon Dixon

TL;DR

This work tackles automatic transcription of jazz solos from audio to readable scores by focusing on Charlie Parker's Omnibook as a stringent benchmark. It proposes a modular pipeline that combines saxophone-source separation (Demucs), a saxophone-specific MIDI transcription model trained on the FiloSax dataset, and a qparse-based score-layout stage with a Parker-based grammar, along with an enhanced dataset of 50 score-audio pairs with aligned MIDI and downbeats. The authors report strongest performance on sax separation and MIDI transcription for Parker material, while acknowledging beat-tracking and full audio-to-score accuracy remain challenging due to jazz timing, swing, and expressive timing. The work contributes a public dataset, model checkpoints, and reusable code, advancing toward scalable audio-to-score transcription for music education and preservation.

Abstract

The Charlie Parker Omnibook is a cornerstone of jazz music education, described by pianist Ethan Iverson as "the most important jazz education text ever published". In this work we propose a new transcription pipeline and explore the extent to which state of the art music technology is able to reconstruct these scores directly from the audio without human intervention. Our pipeline includes: a newly trained source separation model for saxophone, a new MIDI transcription model for solo saxophone and an adaptation of an existing MIDI-to-score method for monophonic instruments. To assess this pipeline we also provide an enhanced dataset of Charlie Parker transcriptions as score-audio pairs with accurate MIDI alignments and downbeat annotations. This represents a challenging new benchmark for automatic audio-to-score transcription that we hope will advance research into areas beyond transcribing audio-to-MIDI alone. Together, these form another step towards producing scores that musicians can use directly, without the need for onerous corrections or revisions. To facilitate future research, all model checkpoints and data are made available to download along with code for the transcription pipeline. Improvements in our modular pipeline could one day make the automatic transcription of complex jazz solos a routine possibility, thereby enriching the resources available for music education and preservation.

Reconstructing the Charlie Parker Omnibook using an audio-to-score automatic transcription pipeline

TL;DR

This work tackles automatic transcription of jazz solos from audio to readable scores by focusing on Charlie Parker's Omnibook as a stringent benchmark. It proposes a modular pipeline that combines saxophone-source separation (Demucs), a saxophone-specific MIDI transcription model trained on the FiloSax dataset, and a qparse-based score-layout stage with a Parker-based grammar, along with an enhanced dataset of 50 score-audio pairs with aligned MIDI and downbeats. The authors report strongest performance on sax separation and MIDI transcription for Parker material, while acknowledging beat-tracking and full audio-to-score accuracy remain challenging due to jazz timing, swing, and expressive timing. The work contributes a public dataset, model checkpoints, and reusable code, advancing toward scalable audio-to-score transcription for music education and preservation.

Abstract

The Charlie Parker Omnibook is a cornerstone of jazz music education, described by pianist Ethan Iverson as "the most important jazz education text ever published". In this work we propose a new transcription pipeline and explore the extent to which state of the art music technology is able to reconstruct these scores directly from the audio without human intervention. Our pipeline includes: a newly trained source separation model for saxophone, a new MIDI transcription model for solo saxophone and an adaptation of an existing MIDI-to-score method for monophonic instruments. To assess this pipeline we also provide an enhanced dataset of Charlie Parker transcriptions as score-audio pairs with accurate MIDI alignments and downbeat annotations. This represents a challenging new benchmark for automatic audio-to-score transcription that we hope will advance research into areas beyond transcribing audio-to-MIDI alone. Together, these form another step towards producing scores that musicians can use directly, without the need for onerous corrections or revisions. To facilitate future research, all model checkpoints and data are made available to download along with code for the transcription pipeline. Improvements in our modular pipeline could one day make the automatic transcription of complex jazz solos a routine possibility, thereby enriching the resources available for music education and preservation.
Paper Structure (18 sections, 2 figures, 5 tables)

This paper contains 18 sections, 2 figures, 5 tables.

Figures (2)

  • Figure 1: An extract of "Marmaduke" with ground truth on the upper staff and our transcription on the lower staff.
  • Figure 2: An extract of "The Bird" with ground truth on the upper staff and our transcription on the lower staff.