Table of Contents
Fetching ...

YNote: A Novel Music Notation for Fine-Tuning LLMs in Music Generation

Shao-Chien Lu, Chen-Chen Yeh, Hui-Lin Cho, Chun-Chieh Hsu, Tsai-Ling Hsu, Cheng-Han Wu, Timothy K. Shih, Yu-Cheng Lin

TL;DR

This work addresses the difficulty of fine-tuning LLMs for music generation using existing notations by introducing YNote, a fixed-format, two-component (pitch and duration) notation with a compact two-character encoding. By converting 190 Jiangnan-style tunes into YNote and fine-tuning the GPT-2 (124M) model, the authors demonstrate that a simple prompt (e.g., the first bar or first+last notes) can yield coherent music, achieving BLEU = $0.883$ and ROUGE = $0.766$. The approach requires minimal normalization (roughly $1.6\%$–$2.2\%$ edits) and runs efficiently on a single RTX 4070 Ti, underscoring YNote’s practicality for machine learning. Overall, YNote presents a practical alternative to traditional music notations for ML applications, with potential to enhance style-controlled music generation and learning efficiency.

Abstract

The field of music generation using Large Language Models (LLMs) is evolving rapidly, yet existing music notation systems, such as MIDI, ABC Notation, and MusicXML, remain too complex for effective fine-tuning of LLMs. These formats are difficult for both machines and humans to interpret due to their variability and intricate structure. To address these challenges, we introduce YNote, a simplified music notation system that uses only four characters to represent a note and its pitch. YNote's fixed format ensures consistency, making it easy to read and more suitable for fine-tuning LLMs. In our experiments, we fine-tuned GPT-2 (124M) on a YNote-encoded dataset and achieved BLEU and ROUGE scores of 0.883 and 0.766, respectively. With just two notes as prompts, the model was able to generate coherent and stylistically relevant music. We believe YNote offers a practical alternative to existing music notations for machine learning applications and has the potential to significantly enhance the quality of music generation using LLMs.

YNote: A Novel Music Notation for Fine-Tuning LLMs in Music Generation

TL;DR

This work addresses the difficulty of fine-tuning LLMs for music generation using existing notations by introducing YNote, a fixed-format, two-component (pitch and duration) notation with a compact two-character encoding. By converting 190 Jiangnan-style tunes into YNote and fine-tuning the GPT-2 (124M) model, the authors demonstrate that a simple prompt (e.g., the first bar or first+last notes) can yield coherent music, achieving BLEU = and ROUGE = . The approach requires minimal normalization (roughly edits) and runs efficiently on a single RTX 4070 Ti, underscoring YNote’s practicality for machine learning. Overall, YNote presents a practical alternative to traditional music notations for ML applications, with potential to enhance style-controlled music generation and learning efficiency.

Abstract

The field of music generation using Large Language Models (LLMs) is evolving rapidly, yet existing music notation systems, such as MIDI, ABC Notation, and MusicXML, remain too complex for effective fine-tuning of LLMs. These formats are difficult for both machines and humans to interpret due to their variability and intricate structure. To address these challenges, we introduce YNote, a simplified music notation system that uses only four characters to represent a note and its pitch. YNote's fixed format ensures consistency, making it easy to read and more suitable for fine-tuning LLMs. In our experiments, we fine-tuned GPT-2 (124M) on a YNote-encoded dataset and achieved BLEU and ROUGE scores of 0.883 and 0.766, respectively. With just two notes as prompts, the model was able to generate coherent and stylistically relevant music. We believe YNote offers a practical alternative to existing music notations for machine learning applications and has the potential to significantly enhance the quality of music generation using LLMs.

Paper Structure

This paper contains 16 sections, 7 figures.

Figures (7)

  • Figure 1: Boat on Tai Lake in Various Music Notations
  • Figure 2: Overview of YNote Format
  • Figure 3: Flowchart for Generating Music in YNote Format
  • Figure 4: Qualitative Evaluation of Fine-Tuned GPT-2 Models Using the First Bar as Prompt
  • Figure 5: Qualitative Evaluation of Fine-Tuned GPT-2 Models with the First and Last Notes of Each Bar as Prompt
  • ...and 2 more figures