Climbing the label tree: Hierarchy-preserving contrastive learning for medical imaging
Alif Elham Khan
TL;DR
The paper tackles the problem that standard self-supervised learning on medical images overlooks hierarchical label structures and semantics. It introduces two plug-in objectives, Hierarchy-Weighted Contrastive (HWC) and Level-Aware Margin (LAM), which inject the label tree into the contrastive objective and prototype-based margins in a geometry-agnostic manner compatible with Euclidean or hyperbolic embeddings. HWC reshapes the softmax competition by pair-specific multipliers based on shared ancestry, while LAM enforces inter-level separation via level-wise prototypes and margins; together they improve hierarchy-faithful metrics (HF1, H-Acc) and reduce parent-distance violations, without sacrificing top-1 accuracy. Across several datasets with varying taxonomies, including BreakHis, HAM-10K, ODIR-5K, iNaturalist, and DeepFashion In-Shop, the methods yield consistently better taxonomy alignment, with hyperbolic variants providing extra gains on deeper trees and Euclidean variants offering strong, practical plug-ins. This work offers a simple, general recipe for learning medically relevant, hierarchy-respecting image representations that enhance both interpretability and performance.
Abstract
Medical image labels are often organized by taxonomies (e.g., organ - tissue - subtype), yet standard self-supervised learning (SSL) ignores this structure. We present a hierarchy-preserving contrastive framework that makes the label tree a first-class training signal and an evaluation target. Our approach introduces two plug-in objectives: Hierarchy-Weighted Contrastive (HWC), which scales positive/negative pair strengths by shared ancestors to promote within-parent coherence, and Level-Aware Margin (LAM), a prototype margin that separates ancestor groups across levels. The formulation is geometry-agnostic and applies to Euclidean and hyperbolic embeddings without architectural changes. Across several benchmarks, including breast histopathology, the proposed objectives consistently improve representation quality over strong SSL baselines while better respecting the taxonomy. We evaluate with metrics tailored to hierarchy faithfulness: HF1 (hierarchical F1), H-Acc (tree-distance-weighted accuracy), and parent-distance violation rate. We also report top-1 accuracy for completeness. Ablations show that HWC and LAM are effective even without curvature, and combining them yields the most taxonomy-aligned representations. Taken together, these results provide a simple, general recipe for learning medical image representations that respect the label tree and advance both performance and interpretability in hierarchy-rich domains.
