Transcribing Rhythmic Patterns of the Guitar Track in Polyphonic Music
Aleksandr Lukoianov, Anssi Klapuri
TL;DR
This work tackles the transcription of rhythmic guitar patterns in polyphonic music, an underexplored aspect compared to chord transcription. It introduces a three-step framework: approximate stem separation to isolate the guitar, strum detection with a pre-trained foundation model (MERT) refined by fine-tuning, and a pattern-decoding stage that maps strums to an expert-curated rhythmic vocabulary, augmented by bar-line estimation. A ground-truth dataset of 931 recordings across 410 songs with 924 rhythm patterns enables evaluation of accuracy and readability, and the authors release open-source tooling for the process. The results demonstrate high accuracy and human readability, with robust bar-line and time-signature estimation, and ablations illustrate the benefits of fine-tuning, data augmentation, and the Viterbi-based decoding framework. This work provides a practical pipeline for lead-sheet style rhythmic transcription in polyphonic music and has potential implications for music information retrieval, arrangement, and education.
Abstract
Whereas chord transcription has received considerable attention during the past couple of decades, far less work has been devoted to transcribing and encoding the rhythmic patterns that occur in a song. The topic is especially relevant for instruments such as the rhythm guitar, which is typically played by strumming rhythmic patterns that repeat and vary over time. However, in many cases one cannot objectively define a single "right" rhythmic pattern for a given song section. To create a dataset with well-defined ground-truth labels, we asked expert musicians to transcribe the rhythmic patterns in 410 popular songs and record cover versions where the guitar tracks followed those transcriptions. To transcribe the strums and their corresponding rhythmic patterns, we propose a three-step framework. Firstly, we perform approximate stem separation to extract the guitar part from the polyphonic mixture. Secondly, we detect individual strums within the separated guitar audio, using a pre-trained foundation model (MERT) as a backbone. Finally, we carry out a pattern-decoding process in which the transcribed sequence of guitar strums is represented by patterns drawn from an expert-curated vocabulary. We show that it is possible to transcribe the rhythmic patterns of the guitar track in polyphonic music with quite high accuracy, producing a representation that is human-readable and includes automatically detected bar lines and time signature markers. We perform ablation studies and error analysis and propose a set of evaluation metrics to assess the accuracy and readability of the predicted rhythmic pattern sequence.
