Improving Detection of Rare Nodes in Hierarchical Multi-Label Learning

Isaac Xu; Martin Gillis; Ayushi Sharma; Benjamin Misiuk; Craig J. Brown; Thomas Trappenberg

Improving Detection of Rare Nodes in Hierarchical Multi-Label Learning

Isaac Xu, Martin Gillis, Ayushi Sharma, Benjamin Misiuk, Craig J. Brown, Thomas Trappenberg

TL;DR

This work targets the persistent challenge of detecting rare, deep hierarchical nodes in hierarchical multi-label learning (HML). It introduces a node-centered loss that combines imbalance weighting with a focal weighting term derived from ensemble uncertainty within a Coherent HML Neural Network (C-HMCNN). Through extensive experiments on FUN/GO gene-product datasets and BenthicNet-E, the approach yields large recall and F$_1$ gains for rare nodes, with notable improvements when using uncertainty-based focal terms such as bBMA and GMU, especially as ensemble size grows. The results demonstrate robustness to suboptimal encoders and limited data, offering a practical, generalizable method for reliable deep-hierarchy predictions across biology and vision domains.

Abstract

In hierarchical multi-label classification, a persistent challenge is enabling model predictions to reach deeper levels of the hierarchy for more detailed or fine-grained classifications. This difficulty partly arises from the natural rarity of certain classes (or hierarchical nodes) and the hierarchical constraint that ensures child nodes are almost always less frequent than their parents. To address this, we propose a weighted loss objective for neural networks that combines node-wise imbalance weighting with focal weighting components, the latter leveraging modern quantification of ensemble uncertainties. By emphasizing rare nodes rather than rare observations (data points), and focusing on uncertain nodes for each model output distribution during training, we observe improvements in recall by up to a factor of five on benchmark datasets, along with statistically significant gains in $F_{1}$ score. We also show our approach aids convolutional networks on challenging tasks, as in situations with suboptimal encoders or limited data.

Improving Detection of Rare Nodes in Hierarchical Multi-Label Learning

TL;DR

gains for rare nodes, with notable improvements when using uncertainty-based focal terms such as bBMA and GMU, especially as ensemble size grows. The results demonstrate robustness to suboptimal encoders and limited data, offering a practical, generalizable method for reliable deep-hierarchy predictions across biology and vision domains.

Abstract

score. We also show our approach aids convolutional networks on challenging tasks, as in situations with suboptimal encoders or limited data.

Paper Structure (27 sections, 14 equations, 8 figures, 9 tables)

This paper contains 27 sections, 14 equations, 8 figures, 9 tables.

Introduction
Background
Learning with Hierarchical Constraint
Methods
Imbalance Weighting
Focal Weighting
Metrics
Related Works
Experiments
Datasets
Imbalance Weighting on Gene Product Data
Focal Weighting on Gene Product Data
Echinoderm Vision Modelling
Conclusion
Acknowledgements
...and 12 more sections

Figures (8)

Figure 1: Overview of HML Rare Node Problem and Proposed Weighted Approach. The proposed approach towards detecting rare descendant nodes in HML problems consists of two branches. The imbalance branch focuses on a node-wise explorationweighting system independent of sample frequency, while the focal branch employs modern measures of uncertainty in determining challenging nodes for the model ensemble.
Figure 2: Comparison of the Effects of Varying $\tilde{w}_0$ on Derisi (FUN). Plotted are node-wise precision and recall scores across the top-20 most frequent nodes in their respective datasets.
Figure 3: Focal Weighting with Increasing Ensemble Size on Cellcycle (FUN). For both displayed measures, the bold central line indicates the mean of the uncertainty term used for focal weighting, while the blurred outline indicates a range of one $\sigma$.
Figure 4: Evaluation of $F_{1}$ over Training Completion on Echinoderms. The training factor acts as a parameter shifting from 0.0 (randomly initialized encoder) to 1.0 (fully trained ImageNet encoder). In \ref{['figa:noise_factor_f1']}, the weighting methods and an unweighted baseline are compared to each other. In \ref{['figb:noise_factor_f1_diff']}, the difference between GMU and the unweighted baseline is shown, illustrating the dynamical advantage of the weighting method as the encoder becomes better trained.
Figure 5: Combining Weighted and Unweighted Objectives. (\ref{['figa:schedulers']}) Scheduler effect on $F_{1}$ score across all FUN and GO datasets, and (\ref{['figb:mixed_loss']}) mixed loss objective, with varying $\lambda$ effect.
...and 3 more figures

Improving Detection of Rare Nodes in Hierarchical Multi-Label Learning

TL;DR

Abstract

Improving Detection of Rare Nodes in Hierarchical Multi-Label Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (8)