Table of Contents
Fetching ...

A Semantics-Aware Hierarchical Self-Supervised Approach to Classification of Remote Sensing Images

Giulio Weikmann, Gianmarco Perantoni, Lorenzo Bruzzone

TL;DR

This work tackles the underutilization of hierarchical semantics in remote sensing image classification by introducing Semantics-Aware Hierarchical Consensus (SAHC). SAHC attaches multiple hierarchy-specific heads to backbones, learns cross-level relationships via trainable projection matrices, and enforces cross-level agreement through a self-supervised consensus loss that operates in the log domain. The approach yields state-of-the-art performance across VHR scene classification and multispectral time-series segmentation on multiple datasets, and it demonstrates robustness to label noise by discovering data-driven hierarchies that can extend beyond user-defined mappings. Its modular design and demonstrated backbone-agnostic adaptability offer practical impact for scalable, hierarchy-aware RS analysis, with potential extensions to learn hierarchies directly from data.

Abstract

Deep learning has become increasingly important in remote sensing image classification due to its ability to extract semantic information from complex data. Classification tasks often include predefined label hierarchies that represent the semantic relationships among classes. However, these hierarchies are frequently overlooked, and most approaches focus only on fine-grained classification schemes. In this paper, we present a novel Semantics-Aware Hierarchical Consensus (SAHC) method for learning hierarchical features and relationships by integrating hierarchy-specific classification heads within a deep network architecture, each specialized in different degrees of class granularity. The proposed approach employs trainable hierarchy matrices, which guide the network through the learning of the hierarchical structure in a self-supervised manner. Furthermore, we introduce a hierarchical consensus mechanism to ensure consistent probability distributions across different hierarchical levels. This mechanism acts as a weighted ensemble being able to effectively leverage the inherent structure of the hierarchical classification task. The proposed SAHC method is evaluated on three benchmark datasets with different degrees of hierarchical complexity on different tasks, using distinct backbone architectures to effectively emphasize its adaptability. Experimental results show both the effectiveness of the proposed approach in guiding network learning and the robustness of the hierarchical consensus for remote sensing image classification tasks.

A Semantics-Aware Hierarchical Self-Supervised Approach to Classification of Remote Sensing Images

TL;DR

This work tackles the underutilization of hierarchical semantics in remote sensing image classification by introducing Semantics-Aware Hierarchical Consensus (SAHC). SAHC attaches multiple hierarchy-specific heads to backbones, learns cross-level relationships via trainable projection matrices, and enforces cross-level agreement through a self-supervised consensus loss that operates in the log domain. The approach yields state-of-the-art performance across VHR scene classification and multispectral time-series segmentation on multiple datasets, and it demonstrates robustness to label noise by discovering data-driven hierarchies that can extend beyond user-defined mappings. Its modular design and demonstrated backbone-agnostic adaptability offer practical impact for scalable, hierarchy-aware RS analysis, with potential extensions to learn hierarchies directly from data.

Abstract

Deep learning has become increasingly important in remote sensing image classification due to its ability to extract semantic information from complex data. Classification tasks often include predefined label hierarchies that represent the semantic relationships among classes. However, these hierarchies are frequently overlooked, and most approaches focus only on fine-grained classification schemes. In this paper, we present a novel Semantics-Aware Hierarchical Consensus (SAHC) method for learning hierarchical features and relationships by integrating hierarchy-specific classification heads within a deep network architecture, each specialized in different degrees of class granularity. The proposed approach employs trainable hierarchy matrices, which guide the network through the learning of the hierarchical structure in a self-supervised manner. Furthermore, we introduce a hierarchical consensus mechanism to ensure consistent probability distributions across different hierarchical levels. This mechanism acts as a weighted ensemble being able to effectively leverage the inherent structure of the hierarchical classification task. The proposed SAHC method is evaluated on three benchmark datasets with different degrees of hierarchical complexity on different tasks, using distinct backbone architectures to effectively emphasize its adaptability. Experimental results show both the effectiveness of the proposed approach in guiding network learning and the robustness of the hierarchical consensus for remote sensing image classification tasks.

Paper Structure

This paper contains 25 sections, 19 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Overview of the proposed hierarchical structure within the backbone (left) and associated hierarchical label tree (right). Different classification modules are trained to classify different hierarchical levels from the features extracted by the backbone.
  • Figure 2: Illustration of both direct and distant hierarchical mappings, from fine-grained classes to coarse-grained classes.
  • Figure 3: A semicircular hierarchical label tree representation of the Emilia LU dataset at the fine-grained level of the hierarchy. The leaves corresponding to the fine-grained classes are shown in green, while the nodes for intermediate classes are depicted in blue, and those for coarse-grained classes are shown in red. The root node is represented in black.
  • Figure 4: Example of qualitative results at the fine-grained level on the HRLC-CCI dataset obtained by the considered methods using the Swin Transformer backbone.
  • Figure 5: Heatmap of the estimated hierarchy log-joint matrices considering the Swin Transformer backbone on the ELU dataset from fine-grained classes to intermediate classes. The blue and green rectangles identify the expected hierarchical aggregation of the fine-grained classes at the intermediate and coarse levels, respectively.
  • ...and 1 more figures