Table of Contents
Fetching ...

TART: A Comprehensive Tool for Technique-Aware Audio-to-Tab Guitar Transcription

Akshaj Gupta, Andrea Guzman, Anagha Badriprasad, Hwi Joo Park, Upasana Puranik, Robin Netzorg, Jiachen Lian, Gopala Krishna Anumanchipalli

TL;DR

TART tackles the challenging problem of guitar automatic music transcription by delivering a four-stage pipeline that converts audio to MIDI, labels expressive techniques, resolves string–fret ambiguity with a Fretting-Transformer, and generates detailed ASCII tablature. The approach leverages piano-based transcription with guitar-domain fine-tuning, unified expressive-dataset training, and a sequence-to-sequence mapping for fingering, culminating in a playable tab output aligned with expressive gestures. Key contributions include a unified .jams-based data representation, cross-dataset expressive-technique classification with 76% accuracy, and near-perfect pitch alignment on challenging datasets when combined with post-processing, alongside a beginner-friendly tablature simplification path. This framework advances end-to-end guitar transcription and offers a practical route toward richer, more accurate guitar TAB generation in real-world settings.

Abstract

Automatic Music Transcription (AMT) has advanced significantly for the piano, but transcription for the guitar remains limited due to several key challenges. Existing systems fail to detect and annotate expressive techniques (e.g., slides, bends, percussive hits) and incorrectly map notes to the wrong string and fret combination in the generated tablature. Furthermore, prior models are typically trained on small, isolated datasets, limiting their generalizability to real-world guitar recordings. To overcome these limitations, we propose a four-stage end-to-end pipeline that produces detailed guitar tablature directly from audio. Our system consists of (1) Audio-to-MIDI pitch conversion through a piano transcription model adapted to guitar datasets; (2) MLP-based expressive technique classification; (3) Transformer-based string and fret assignment; and (4) LSTM-based tablature generation. To the best of our knowledge, this framework is the first to generate detailed tablature with accurate fingerings and expressive labels from guitar audio.

TART: A Comprehensive Tool for Technique-Aware Audio-to-Tab Guitar Transcription

TL;DR

TART tackles the challenging problem of guitar automatic music transcription by delivering a four-stage pipeline that converts audio to MIDI, labels expressive techniques, resolves string–fret ambiguity with a Fretting-Transformer, and generates detailed ASCII tablature. The approach leverages piano-based transcription with guitar-domain fine-tuning, unified expressive-dataset training, and a sequence-to-sequence mapping for fingering, culminating in a playable tab output aligned with expressive gestures. Key contributions include a unified .jams-based data representation, cross-dataset expressive-technique classification with 76% accuracy, and near-perfect pitch alignment on challenging datasets when combined with post-processing, alongside a beginner-friendly tablature simplification path. This framework advances end-to-end guitar transcription and offers a practical route toward richer, more accurate guitar TAB generation in real-world settings.

Abstract

Automatic Music Transcription (AMT) has advanced significantly for the piano, but transcription for the guitar remains limited due to several key challenges. Existing systems fail to detect and annotate expressive techniques (e.g., slides, bends, percussive hits) and incorrectly map notes to the wrong string and fret combination in the generated tablature. Furthermore, prior models are typically trained on small, isolated datasets, limiting their generalizability to real-world guitar recordings. To overcome these limitations, we propose a four-stage end-to-end pipeline that produces detailed guitar tablature directly from audio. Our system consists of (1) Audio-to-MIDI pitch conversion through a piano transcription model adapted to guitar datasets; (2) MLP-based expressive technique classification; (3) Transformer-based string and fret assignment; and (4) LSTM-based tablature generation. To the best of our knowledge, this framework is the first to generate detailed tablature with accurate fingerings and expressive labels from guitar audio.

Paper Structure

This paper contains 15 sections, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Visual representation of the TART framework. Raw guitar audio is first converted into MIDI note events, capturing pitch, onset, and offset information as shown in stage (1). In stage (2), an expressive technique classifier analyzes the audio to label each note with the corresponding techniques (e.g., hammer-on, tapping). In stage (3), we use a transformer model that takes MIDI note sequences as input and predicts string and fret positions. In stage (4), the gathered data is merged to generate the sample tablature shown.
  • Figure 2: Confusion matrix for the seven test-set-filtered output classes.