Table of Contents
Fetching ...

MIDI-to-Tab: Guitar Tablature Inference via Masked Language Modeling

Drew Edwards, Xavier Riley, Pedro Sarmento, Simon Dixon

TL;DR

The paper tackles the problem of converting symbolic guitar scores into guitar tablature by assigning each note to a string and fret, a task complicated by multiple valid positions. It introduces a Transformer-based masked language modeling approach (BART-style encoder–decoder) with a Structured MidiTok tokenization to predict per-note string assignments, trained in two phases on large-scale DadaGP tablatures and then fine-tuned on professional performances. Quantitative results show high next-note accuracy and strong autoregressive agreement, while a user study with 15 guitarists demonstrates that the system's tablatures are often preferred over commercial tools, indicating practical playability improvements. The work advances automatic tablature inference without audio or video cues and points to future enhancements in tunings, articulations, and more physics-informed post-processing.

Abstract

Guitar tablatures enrich the structure of traditional music notation by assigning each note to a string and fret of a guitar in a particular tuning, indicating precisely where to play the note on the instrument. The problem of generating tablature from a symbolic music representation involves inferring this string and fret assignment per note across an entire composition or performance. On the guitar, multiple string-fret assignments are possible for most pitches, which leads to a large combinatorial space that prevents exhaustive search approaches. Most modern methods use constraint-based dynamic programming to minimize some cost function (e.g.\ hand position movement). In this work, we introduce a novel deep learning solution to symbolic guitar tablature estimation. We train an encoder-decoder Transformer model in a masked language modeling paradigm to assign notes to strings. The model is first pre-trained on DadaGP, a dataset of over 25K tablatures, and then fine-tuned on a curated set of professionally transcribed guitar performances. Given the subjective nature of assessing tablature quality, we conduct a user study amongst guitarists, wherein we ask participants to rate the playability of multiple versions of tablature for the same four-bar excerpt. The results indicate our system significantly outperforms competing algorithms.

MIDI-to-Tab: Guitar Tablature Inference via Masked Language Modeling

TL;DR

The paper tackles the problem of converting symbolic guitar scores into guitar tablature by assigning each note to a string and fret, a task complicated by multiple valid positions. It introduces a Transformer-based masked language modeling approach (BART-style encoder–decoder) with a Structured MidiTok tokenization to predict per-note string assignments, trained in two phases on large-scale DadaGP tablatures and then fine-tuned on professional performances. Quantitative results show high next-note accuracy and strong autoregressive agreement, while a user study with 15 guitarists demonstrates that the system's tablatures are often preferred over commercial tools, indicating practical playability improvements. The work advances automatic tablature inference without audio or video cues and points to future enhancements in tunings, articulations, and more physics-informed post-processing.

Abstract

Guitar tablatures enrich the structure of traditional music notation by assigning each note to a string and fret of a guitar in a particular tuning, indicating precisely where to play the note on the instrument. The problem of generating tablature from a symbolic music representation involves inferring this string and fret assignment per note across an entire composition or performance. On the guitar, multiple string-fret assignments are possible for most pitches, which leads to a large combinatorial space that prevents exhaustive search approaches. Most modern methods use constraint-based dynamic programming to minimize some cost function (e.g.\ hand position movement). In this work, we introduce a novel deep learning solution to symbolic guitar tablature estimation. We train an encoder-decoder Transformer model in a masked language modeling paradigm to assign notes to strings. The model is first pre-trained on DadaGP, a dataset of over 25K tablatures, and then fine-tuned on a curated set of professionally transcribed guitar performances. Given the subjective nature of assessing tablature quality, we conduct a user study amongst guitarists, wherein we ask participants to rate the playability of multiple versions of tablature for the same four-bar excerpt. The results indicate our system significantly outperforms competing algorithms.
Paper Structure (14 sections, 2 equations, 8 figures, 1 table)

This paper contains 14 sections, 2 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Overview of the training procedure. Guitar Pro files from DadaGP are converted to six-track MIDI files, one file per distinct guitar part and one track per string. These are tokenized into the Structured tokenization of MidiTok. We train a BART model in a simple masked language modeling task where the string tokens are masked out. Only the predictions for the string tokens are used for loss signal propagation.
  • Figure 2: A diagram of our quintile inference algorithm. The middle fifth of the attention window is predicted in an auto-regressive fashion. String assignments from earlier quintiles are fixed. Future notes are available in the context window but will not be assigned until the processing window places them in the center. The beam search is not depicted.
  • Figure 3: Heatmaps of the fret-string distributions for three of the five tablature systems (ground truth, ours, and Guitar Pro 8). Overall, our system has a similar distribution to the ground truth, but the output appears to be biased away from open strings. Guitar Pro 8 shows a heavy skew to low frets, which perhaps suggest a bias towards playing in "first position" (playing primarily on frets 1 to 4).
  • Figure 4: Comparison of the distributions of stretch distances between chords in the test set.
  • Figure 5: An example failure of our system. Ground truth is left, ours is right. The assignment of B2 to the fifth string creates an 8-fret stretch, which is essentially unplayable.
  • ...and 3 more figures