GAPS: A Large and Diverse Classical Guitar Dataset and Benchmark Transcription Model
Xavier Riley, Zixun Guo, Drew Edwards, Simon Dixon
TL;DR
GAPS introduces a large, diverse real-classical-guitar dataset of 14 hours with score-audio alignments, MIDI annotations, and performance videos across 300 audio-score pairs from over 200 performers, addressing data scarcity in guitar AMT. The authors train a high-resolution CRNN transcription model, leveraging pretraining on MAESTRO with augmentation and fine-tuning on GAPS, achieving state-of-the-art results on GuitarSet in both supervised and zero-shot settings and demonstrating strong generalization across guitar timbres. They provide a thorough dataset creation and validation pipeline, comprehensive metadata, and analyses of tuning diversity and performance conditions, highlighting the impact of data quality, quantity, and variety on transcription performance. The work paves the way for more robust automatic guitar transcription and sets a benchmark for future guitar MIR datasets, while also discussing ethical considerations and paths for expansion.
Abstract
We introduce GAPS (Guitar-Aligned Performance Scores), a new dataset of classical guitar performances, and a benchmark guitar transcription model that achieves state-of-the-art performance on GuitarSet in both supervised and zero-shot settings. GAPS is the largest dataset of real guitar audio, containing 14 hours of freely available audio-score aligned pairs, recorded in diverse conditions by over 200 performers, together with high-resolution note-level MIDI alignments and performance videos. These enable us to train a state-of-the-art model for automatic transcription of solo guitar recordings which can generalise well to real world audio that is unseen during training.
