Towards Generalizability to Tone and Content Variations in the Transcription of Amplifier Rendered Electric Guitar Audio
Yu-Hua Chen, Yuan-Chiao Cheng, Yen-Tung Yeh, Jui-Te Wu, Jyh-Shing Roger Jang, Yi-Hsuan Yang
TL;DR
This work tackles automatic transcription of amplifier-rendered electric guitar audio, a domain hampered by limited data and diverse tone variations. It introduces EGDB-PG, a large-toned dataset with 256 presets (16 amplifier heads × 16 cabinets), and the Tone-informed Transformer (TIT), which conditions transcription on tone embeddings $c=f_{\text{tone}}(\mathbf{x}_{r,\theta})$ to yield a score $\hat{\mathbf{s}} = h_{\text{trans}}(\mathbf{x}_{r,\theta}, c)$. Through extensive ablations and comparisons with baselines, the study demonstrates that tone embeddings, content augmentation, and audio normalization significantly improve transcription accuracy across in-domain and out-of-domain amplifier tones, with TIT outperforming existing architectures. Out-of-domain experiments using Neural DSP tones show that a large, diverse tone prior coupled with tone-aware conditioning enhances generalization, establishing a practical framework for robust tone-aware electric guitar transcription and paving the way for broader effect-rendered transcription research.
Abstract
Transcribing electric guitar recordings is challenging due to the scarcity of diverse datasets and the complex tone-related variations introduced by amplifiers, cabinets, and effect pedals. To address these issues, we introduce EGDB-PG, a novel dataset designed to capture a wide range of tone-related characteristics across various amplifier-cabinet configurations. In addition, we propose the Tone-informed Transformer (TIT), a Transformer-based transcription model enhanced with a tone embedding mechanism that leverages learned representations to improve the model's adaptability to tone-related nuances. Experiments demonstrate that TIT, trained on EGDB-PG, outperforms existing baselines across diverse amplifier types, with transcription accuracy improvements driven by the dataset's diversity and the tone embedding technique. Through detailed benchmarking and ablation studies, we evaluate the impact of tone augmentation, content augmentation, audio normalization, and tone embedding on transcription performance. This work advances electric guitar transcription by overcoming limitations in dataset diversity and tone modeling, providing a robust foundation for future research.
