Table of Contents
Fetching ...

MAPLE: Multi-Path Adaptive Propagation with Level-Aware Embeddings for Hierarchical Multi-Label Image Classification

Boshko Koloski, Marjan Stoimchev, Jurica Levatić, Dragi Kocev, Sašo Džeroski

Abstract

Hierarchical multi-label classification (HMLC) is essential for modeling structured label dependencies in remote sensing. Yet existing approaches struggle in multi-path settings, where images may activate multiple taxonomic branches, leading to underuse of hierarchical information. We propose MAPLE (Multi-Path Adaptive Propagation with Level-Aware Embeddings), a framework that integrates (i) hierarchical semantic initialization from graph-aware textual descriptions, (ii) graph-based structure encoding via graph convolutional networks (GCNs), and (iii) adaptive multi-modal fusion that dynamically balances semantic priors and visual evidence. An adaptive level-aware objective automatically selects appropriate losses per hierarchy level. Evaluations on CORINE-aligned remote sensing datasets (AID, DFC-15, and MLRSNet) show consistent improvements of up to +42% in few-shot regimes while adding only 2.6% parameter overhead, demonstrating that MAPLE effectively and efficiently models hierarchical semantics for Earth observation (EO).

MAPLE: Multi-Path Adaptive Propagation with Level-Aware Embeddings for Hierarchical Multi-Label Image Classification

Abstract

Hierarchical multi-label classification (HMLC) is essential for modeling structured label dependencies in remote sensing. Yet existing approaches struggle in multi-path settings, where images may activate multiple taxonomic branches, leading to underuse of hierarchical information. We propose MAPLE (Multi-Path Adaptive Propagation with Level-Aware Embeddings), a framework that integrates (i) hierarchical semantic initialization from graph-aware textual descriptions, (ii) graph-based structure encoding via graph convolutional networks (GCNs), and (iii) adaptive multi-modal fusion that dynamically balances semantic priors and visual evidence. An adaptive level-aware objective automatically selects appropriate losses per hierarchy level. Evaluations on CORINE-aligned remote sensing datasets (AID, DFC-15, and MLRSNet) show consistent improvements of up to +42% in few-shot regimes while adding only 2.6% parameter overhead, demonstrating that MAPLE effectively and efficiently models hierarchical semantics for Earth observation (EO).

Paper Structure

This paper contains 29 sections, 4 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: MAPLE architecture overview. The framework processes an input image with a ViT encoder that uses hierarchy-specific class tokens. A GCN refines these token embeddings by propagating information along the label taxonomy. Finally, visual features and refined semantic embeddings are fused via adaptive gating to produce level-aware classifications.
  • Figure 2: Examples of constructed label hierarchies for (a) the AID dataset, derived from the CORINE Land Cover (CLC) nomenclature, and (b) the MuRed dataset, with abbreviations in leaf labels corresponding to ICD-10 codes of disease names.
  • Figure 3: Representative examples from the nine datasets used in our evaluation, showing sample images alongside their corresponding hierarchical label structures. The datasets span three domains: remote sensing (AID, DFC-15, MLRSNet), medical imaging (MuRed, HPA, PadChest), and fine-grained visual categorization (OxfordPets-37, Stanford Cars, ETHEC). Each hierarchy displays the multi-level taxonomic organization from coarse-grained categories at Level 1 to fine-grained leaf labels.
  • Figure 4: Hierarchical semantic initialization using the Sentence Transformer on a CORINE-derived path. Left: subgraph with the active node (ship) highlighted. Right: instantiated prompts for parent and leaf nodes used for semantic embedding generation.
  • Figure 5: Few-shot learning performance comparison between MAPLE (in green) and flat MLC baseline (in red) across representative datasets. Results show AU$\overline{\textrm{PRC}}$ performance ($\mu \pm \sigma$ across three experimental repeats) for varying numbers of shots per category ($K \in \{4, 8, 12, 16\}$). MAPLE consistently outperforms the baseline, with particularly pronounced benefits in low-data regimes.
  • ...and 4 more figures