Table of Contents
Fetching ...

Difficulty-Controlled Simplification of Piano Scores with Synthetic Data for Inclusive Music Education

Pedro Ramoneda, Emilia Parada-Cabaleiro, Dasaem Jeong, Xavier Serra

TL;DR

This work tackles the problem of making piano education more inclusive by enabling difficulty-controlled score generation using open, public MusicXML data. It introduces PianoPairs, a synthetic dataset of paired piano scores with the same melody/harmony but varying difficulty, and trains a transformer model with melody and harmony conditioning as well as a supervised difficulty-adaptation objective using a SEP token. The approach leverages Linearized MusicXML (LMX) for efficient sequence modeling, employs a Gaussian Naive Bayes-based difficulty estimator and CLaMP-3 style embeddings to curate high-quality pairs, and demonstrates improved control over playability and style through objective metrics and expert evaluations. By releasing code, data, and models, the work advances reproducible research in difficulty-aware music generation and supports Teacher-in-the-Loop educational paradigms for broader accessibility.

Abstract

Despite its potential, AI advances in music education are hindered by proprietary systems that limit the democratization of technology in this domain. In particular, AI-driven music difficulty adjustment is especially promising, as simplifying complex pieces can make music education more inclusive and accessible to learners of all ages and contexts. Nevertheless, recent efforts have relied on proprietary datasets, which prevents the research community from reproducing, comparing, or extending the current state of the art. In addition, while these generative methods offer great potential, most of them use the MIDI format, which, unlike others, such as MusicXML, lacks readability and layout information, thereby limiting their practical use for human performers. This work introduces a transformer-based method for adjusting the difficulty of MusicXML piano scores. Unlike previous methods, which rely on annotated datasets, we propose a synthetic dataset composed of pairs of piano scores ordered by estimated difficulty, with each pair comprising a more challenging and easier arrangement of the same piece. We generate these pairs by creating variations conditioned on the same melody and harmony and leverage pretrained models to assess difficulty and style, ensuring appropriate pairing. The experimental results illustrate the validity of the proposed approach, showing accurate control of playability and target difficulty, as highlighted through qualitative and quantitative evaluations. In contrast to previous work, we openly release all resources (code, dataset, and models), ensuring reproducibility while fostering open-source innovation to help bridge the digital divide.

Difficulty-Controlled Simplification of Piano Scores with Synthetic Data for Inclusive Music Education

TL;DR

This work tackles the problem of making piano education more inclusive by enabling difficulty-controlled score generation using open, public MusicXML data. It introduces PianoPairs, a synthetic dataset of paired piano scores with the same melody/harmony but varying difficulty, and trains a transformer model with melody and harmony conditioning as well as a supervised difficulty-adaptation objective using a SEP token. The approach leverages Linearized MusicXML (LMX) for efficient sequence modeling, employs a Gaussian Naive Bayes-based difficulty estimator and CLaMP-3 style embeddings to curate high-quality pairs, and demonstrates improved control over playability and style through objective metrics and expert evaluations. By releasing code, data, and models, the work advances reproducible research in difficulty-aware music generation and supports Teacher-in-the-Loop educational paradigms for broader accessibility.

Abstract

Despite its potential, AI advances in music education are hindered by proprietary systems that limit the democratization of technology in this domain. In particular, AI-driven music difficulty adjustment is especially promising, as simplifying complex pieces can make music education more inclusive and accessible to learners of all ages and contexts. Nevertheless, recent efforts have relied on proprietary datasets, which prevents the research community from reproducing, comparing, or extending the current state of the art. In addition, while these generative methods offer great potential, most of them use the MIDI format, which, unlike others, such as MusicXML, lacks readability and layout information, thereby limiting their practical use for human performers. This work introduces a transformer-based method for adjusting the difficulty of MusicXML piano scores. Unlike previous methods, which rely on annotated datasets, we propose a synthetic dataset composed of pairs of piano scores ordered by estimated difficulty, with each pair comprising a more challenging and easier arrangement of the same piece. We generate these pairs by creating variations conditioned on the same melody and harmony and leverage pretrained models to assess difficulty and style, ensuring appropriate pairing. The experimental results illustrate the validity of the proposed approach, showing accurate control of playability and target difficulty, as highlighted through qualitative and quantitative evaluations. In contrast to previous work, we openly release all resources (code, dataset, and models), ensuring reproducibility while fostering open-source innovation to help bridge the digital divide.

Paper Structure

This paper contains 19 sections, 14 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Accessible Music through Difficulty-Aware Generation. One musical idea, adapted to each player. AI difficulty control enables everyone to participate and learn.
  • Figure 2: Proposed Training approach for Adaptative Score Generation.
  • Figure 3: Comparison of input representations. Above: two measures of a piano score. Below: MusicXML (left); Linearized MusicXML -- LMX (right) token sequence.
  • Figure 4: Input representation for training the model conditioned on melody and harmony. The input sequence consists of conditioning tokens (melody and harmony) followed by the score tokens.
  • Figure 5: Percentage of $\downarrow$ (easier) adaptations per genre for each experiment on the original benchmark. Axis represent genres; values indicate the percentage of correct adaptations.
  • ...and 2 more figures