Table of Contents
Fetching ...

Audio-to-Score Conversion Model Based on Whisper methodology

Hongyao Zhang, Bohang Sun

TL;DR

The “Orpheus' Score” is innovatively introduced, a custom notation system that converts music information into tokens, designs a custom vocabulary library, and trains a corresponding custom tokenizer.

Abstract

This thesis develops a Transformer model based on Whisper, which extracts melodies and chords from music audio and records them into ABC notation. A comprehensive data processing workflow is customized for ABC notation, including data cleansing, formatting, and conversion, and a mutation mechanism is implemented to increase the diversity and quality of training data. This thesis innovatively introduces the "Orpheus' Score", a custom notation system that converts music information into tokens, designs a custom vocabulary library, and trains a corresponding custom tokenizer. Experiments show that compared to traditional algorithms, the model has significantly improved accuracy and performance. While providing a convenient audio-to-score tool for music enthusiasts, this work also provides new ideas and tools for research in music information processing.

Audio-to-Score Conversion Model Based on Whisper methodology

TL;DR

The “Orpheus' Score” is innovatively introduced, a custom notation system that converts music information into tokens, designs a custom vocabulary library, and trains a corresponding custom tokenizer.

Abstract

This thesis develops a Transformer model based on Whisper, which extracts melodies and chords from music audio and records them into ABC notation. A comprehensive data processing workflow is customized for ABC notation, including data cleansing, formatting, and conversion, and a mutation mechanism is implemented to increase the diversity and quality of training data. This thesis innovatively introduces the "Orpheus' Score", a custom notation system that converts music information into tokens, designs a custom vocabulary library, and trains a corresponding custom tokenizer. Experiments show that compared to traditional algorithms, the model has significantly improved accuracy and performance. While providing a convenient audio-to-score tool for music enthusiasts, this work also provides new ideas and tools for research in music information processing.

Paper Structure

This paper contains 18 sections, 1 equation, 7 figures, 1 table.

Figures (7)

  • Figure 1: Data Cleansing
  • Figure 2: format the key
  • Figure 3: format the meter
  • Figure 4: mutation
  • Figure 5: Tokenize into Orpheus' Score
  • ...and 2 more figures