Table of Contents
Fetching ...

Music Tagging with Classifier Group Chains

Takuya Hasumi, Tatsuya Komatsu, Yusuke Fujita

TL;DR

The paper addresses the limitation of treating music tags independently by introducing classifier group chains, which sequentially estimate tag groups (genre, instrument, mood/theme) using a GRU-based chain decoder atop a fixed audio encoder. This approach models conditional dependencies across tag groups, enabling context-aware predictions and improved tagging performance. Experiments on MTG-Jamendo show gains in ROC-AUC and PR-AUC over conventional decoders, with chain order substantially affecting results. The work highlights the practical benefit of modeling tag interactions for music tagging and suggests future integration with language-model–based captioning to further capture tag interdependencies.

Abstract

We propose music tagging with classifier chains that model the interplay of music tags. Most conventional methods estimate multiple tags independently by treating them as multiple independent binary classification problems. This treatment overlooks the conditional dependencies among music tags, leading to suboptimal tagging performance. Unlike most music taggers, the proposed method sequentially estimates each tag based on the idea of the classifier chains. Beyond the naive classifier chains, the proposed method groups the multiple tags by category, such as genre, and performs chains by unit of groups, which we call \textit{classifier group chains}. Our method allows the modeling of the dependence between tag groups. We evaluate the effectiveness of the proposed method for music tagging performance through music tagging experiments using the MTG-Jamendo dataset. Furthermore, we investigate the effective order of chains for music tagging.

Music Tagging with Classifier Group Chains

TL;DR

The paper addresses the limitation of treating music tags independently by introducing classifier group chains, which sequentially estimate tag groups (genre, instrument, mood/theme) using a GRU-based chain decoder atop a fixed audio encoder. This approach models conditional dependencies across tag groups, enabling context-aware predictions and improved tagging performance. Experiments on MTG-Jamendo show gains in ROC-AUC and PR-AUC over conventional decoders, with chain order substantially affecting results. The work highlights the practical benefit of modeling tag interactions for music tagging and suggests future integration with language-model–based captioning to further capture tag interdependencies.

Abstract

We propose music tagging with classifier chains that model the interplay of music tags. Most conventional methods estimate multiple tags independently by treating them as multiple independent binary classification problems. This treatment overlooks the conditional dependencies among music tags, leading to suboptimal tagging performance. Unlike most music taggers, the proposed method sequentially estimates each tag based on the idea of the classifier chains. Beyond the naive classifier chains, the proposed method groups the multiple tags by category, such as genre, and performs chains by unit of groups, which we call \textit{classifier group chains}. Our method allows the modeling of the dependence between tag groups. We evaluate the effectiveness of the proposed method for music tagging performance through music tagging experiments using the MTG-Jamendo dataset. Furthermore, we investigate the effective order of chains for music tagging.
Paper Structure (12 sections, 9 equations, 2 figures, 3 tables)

This paper contains 12 sections, 9 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Overview of conventional music tagging system with $K$ music tags. $K$ sub-decoders (affine transformation + sigmoid function) estimate each binary flag of the music tag independently.
  • Figure 2: Overview of the proposed music tagging with classifier group chains "genre" $\rightarrow$ "instrument" $\rightarrow$ "mood/theme". Only the decoder part is shown, and the encoder part is identical to the one in Fig. \ref{['fig:conventional-music-tagging/overview']}. Unlike the conventional method, $\gamma$th music tag group (category) is sequentially estimated using the previous estimation results.