Music Tagging with Classifier Group Chains
Takuya Hasumi, Tatsuya Komatsu, Yusuke Fujita
TL;DR
The paper addresses the limitation of treating music tags independently by introducing classifier group chains, which sequentially estimate tag groups (genre, instrument, mood/theme) using a GRU-based chain decoder atop a fixed audio encoder. This approach models conditional dependencies across tag groups, enabling context-aware predictions and improved tagging performance. Experiments on MTG-Jamendo show gains in ROC-AUC and PR-AUC over conventional decoders, with chain order substantially affecting results. The work highlights the practical benefit of modeling tag interactions for music tagging and suggests future integration with language-model–based captioning to further capture tag interdependencies.
Abstract
We propose music tagging with classifier chains that model the interplay of music tags. Most conventional methods estimate multiple tags independently by treating them as multiple independent binary classification problems. This treatment overlooks the conditional dependencies among music tags, leading to suboptimal tagging performance. Unlike most music taggers, the proposed method sequentially estimates each tag based on the idea of the classifier chains. Beyond the naive classifier chains, the proposed method groups the multiple tags by category, such as genre, and performs chains by unit of groups, which we call \textit{classifier group chains}. Our method allows the modeling of the dependence between tag groups. We evaluate the effectiveness of the proposed method for music tagging performance through music tagging experiments using the MTG-Jamendo dataset. Furthermore, we investigate the effective order of chains for music tagging.
