Table of Contents
Fetching ...

BACHI: Boundary-Aware Symbolic Chord Recognition Through Masked Iterative Decoding on Pop and Classical Music

Mingyang Yao, Ke Chen, Shlomo Dubnov, Taylor Berg-Kirkpatrick

TL;DR

This work tackles symbolic chord recognition under data scarcity by introducing POP909-CL, a human-corrected extension of POP909, and BACHI, a boundary-aware transformer that mirrors ear-training through boundary detection followed by masked, confidence-ranked decoding of chord elements $(r, q, b)$. The model operates on beat-synchronous MIDI tokens with a patch-embedded piano-roll input $P \in \{0,1\}^{T \times 88}$ and outputs three chord features per frame, using FiLM conditioning from predicted boundaries and a non-autoregressive, iterative filling strategy. Evaluations on classical (WiR+dcml) and POP909-CL show state-of-the-art results, with ablations confirming the importance of boundary detection and iterative decoding and illustrating genre-specific decision patterns (e.g., quality-first in classical, bass-first in POP909-CL). Overall, the paper delivers a reliable symbolic ACR dataset and a principled, human-inspired decoding approach that enhances accuracy across diverse repertoires, with potential implications for symbolic MIR, music generation, and education.

Abstract

Automatic chord recognition (ACR) via deep learning models has gradually achieved promising recognition accuracy, yet two key challenges remain. First, prior work has primarily focused on audio-domain ACR, while symbolic music (e.g., score) ACR has received limited attention due to data scarcity. Second, existing methods still overlook strategies that are aligned with human music analytical practices. To address these challenges, we make two contributions: (1) we introduce POP909-CL, an enhanced version of POP909 dataset with tempo-aligned content and human-corrected labels of chords, beats, keys, and time signatures; and (2) We propose BACHI, a symbolic chord recognition model that decomposes the task into different decision steps, namely boundary detection and iterative ranking of chord root, quality, and bass (inversion). This mechanism mirrors the human ear-training practices. Experiments demonstrate that BACHI achieves state-of-the-art chord recognition performance on both classical and pop music benchmarks, with ablation studies validating the effectiveness of each module.

BACHI: Boundary-Aware Symbolic Chord Recognition Through Masked Iterative Decoding on Pop and Classical Music

TL;DR

This work tackles symbolic chord recognition under data scarcity by introducing POP909-CL, a human-corrected extension of POP909, and BACHI, a boundary-aware transformer that mirrors ear-training through boundary detection followed by masked, confidence-ranked decoding of chord elements . The model operates on beat-synchronous MIDI tokens with a patch-embedded piano-roll input and outputs three chord features per frame, using FiLM conditioning from predicted boundaries and a non-autoregressive, iterative filling strategy. Evaluations on classical (WiR+dcml) and POP909-CL show state-of-the-art results, with ablations confirming the importance of boundary detection and iterative decoding and illustrating genre-specific decision patterns (e.g., quality-first in classical, bass-first in POP909-CL). Overall, the paper delivers a reliable symbolic ACR dataset and a principled, human-inspired decoding approach that enhances accuracy across diverse repertoires, with potential implications for symbolic MIR, music generation, and education.

Abstract

Automatic chord recognition (ACR) via deep learning models has gradually achieved promising recognition accuracy, yet two key challenges remain. First, prior work has primarily focused on audio-domain ACR, while symbolic music (e.g., score) ACR has received limited attention due to data scarcity. Second, existing methods still overlook strategies that are aligned with human music analytical practices. To address these challenges, we make two contributions: (1) we introduce POP909-CL, an enhanced version of POP909 dataset with tempo-aligned content and human-corrected labels of chords, beats, keys, and time signatures; and (2) We propose BACHI, a symbolic chord recognition model that decomposes the task into different decision steps, namely boundary detection and iterative ranking of chord root, quality, and bass (inversion). This mechanism mirrors the human ear-training practices. Experiments demonstrate that BACHI achieves state-of-the-art chord recognition performance on both classical and pop music benchmarks, with ablation studies validating the effectiveness of each module.

Paper Structure

This paper contains 12 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The model architecture and inference mechanism of BACHI, from the model backbone (left and middle), boundary detection and conditioning (middle), and iterative decoding (right).
  • Figure 2: The chord label comparison between the rule-based extraction (original in POP909) to human corrected ones (in POP909-CL).
  • Figure 3: Confusion matrices on chord quality in classical corpus and POP909-CL evaluation sets.