Table of Contents
Fetching ...

Online hierarchical partitioning of the output space in extreme multi-label data stream

Lara Neves, Afonso Lourenço, Alberto Cano, Goreti Marreiros

TL;DR

This work tackles streaming multi-label classification under evolving label distributions by introducing iHOMER, an online framework that incrementally partitions the label space into disjoint, correlated clusters and trains a global learner per partition. It combines online Jaccard-based dissimilarity-driven clustering, a growing hierarchical partition, and drift-aware reconfiguration to address high dimensionality, label imbalance, and non-stationarity. Empirical results on 23 real datasets show iHOMER consistently outperforms both global and local baselines, with significant gains in exact-match and micro-averaged metrics, demonstrating the value of hybrid partitions for modeling label dependencies in data streams. The approach offers a practical, scalable solution for robust online multi-label classification with open-source resources for reproduction and extension.

Abstract

Mining data streams with multi-label outputs poses significant challenges due to evolving distributions, high-dimensional label spaces, sparse label occurrences, and complex label dependencies. Moreover, concept drift affects not only input distributions but also label correlations and imbalance ratios over time, complicating model adaptation. To address these challenges, structured learners are categorized into local and global methods. Local methods break down the task into simpler components, while global methods adapt the algorithm to the full output space, potentially yielding better predictions by exploiting label correlations. This work introduces iHOMER (Incremental Hierarchy Of Multi-label Classifiers), an online multi-label learning framework that incrementally partitions the label space into disjoint, correlated clusters without relying on predefined hierarchies. iHOMER leverages online divisive-agglomerative clustering based on \textit{Jaccard} similarity and a global tree-based learner driven by a multivariate \textit{Bernoulli} process to guide instance partitioning. To address non-stationarity, it integrates drift detection mechanisms at both global and local levels, enabling dynamic restructuring of label partitions and subtrees. Experiments across 23 real-world datasets show iHOMER outperforms 5 state-of-the-art global baselines, such as MLHAT, MLHT of Pruned Sets and iSOUPT, by 23\%, and 12 local baselines, such as binary relevance transformations of kNN, EFDT, ARF, and ADWIN bagging/boosting ensembles, by 32\%, establishing its robustness for online multi-label classification.

Online hierarchical partitioning of the output space in extreme multi-label data stream

TL;DR

This work tackles streaming multi-label classification under evolving label distributions by introducing iHOMER, an online framework that incrementally partitions the label space into disjoint, correlated clusters and trains a global learner per partition. It combines online Jaccard-based dissimilarity-driven clustering, a growing hierarchical partition, and drift-aware reconfiguration to address high dimensionality, label imbalance, and non-stationarity. Empirical results on 23 real datasets show iHOMER consistently outperforms both global and local baselines, with significant gains in exact-match and micro-averaged metrics, demonstrating the value of hybrid partitions for modeling label dependencies in data streams. The approach offers a practical, scalable solution for robust online multi-label classification with open-source resources for reproduction and extension.

Abstract

Mining data streams with multi-label outputs poses significant challenges due to evolving distributions, high-dimensional label spaces, sparse label occurrences, and complex label dependencies. Moreover, concept drift affects not only input distributions but also label correlations and imbalance ratios over time, complicating model adaptation. To address these challenges, structured learners are categorized into local and global methods. Local methods break down the task into simpler components, while global methods adapt the algorithm to the full output space, potentially yielding better predictions by exploiting label correlations. This work introduces iHOMER (Incremental Hierarchy Of Multi-label Classifiers), an online multi-label learning framework that incrementally partitions the label space into disjoint, correlated clusters without relying on predefined hierarchies. iHOMER leverages online divisive-agglomerative clustering based on \textit{Jaccard} similarity and a global tree-based learner driven by a multivariate \textit{Bernoulli} process to guide instance partitioning. To address non-stationarity, it integrates drift detection mechanisms at both global and local levels, enabling dynamic restructuring of label partitions and subtrees. Experiments across 23 real-world datasets show iHOMER outperforms 5 state-of-the-art global baselines, such as MLHAT, MLHT of Pruned Sets and iSOUPT, by 23\%, and 12 local baselines, such as binary relevance transformations of kNN, EFDT, ARF, and ADWIN bagging/boosting ensembles, by 32\%, establishing its robustness for online multi-label classification.

Paper Structure

This paper contains 9 sections, 11 equations, 8 figures, 3 tables, 1 algorithm.

Figures (8)

  • Figure 1: Hierarchy of label clustered classifiers evolves over time.
  • Figure 2: Dissimilarity graph based on label co-occurrences.
  • Figure 3: Dynamic hierarchical label clusters in Hypersphere dataset.
  • Figure 4: Rolling Sample Accuracy on the Yelp dataset.
  • Figure 5: Nemenyi test - Subset Accuracy.
  • ...and 3 more figures