Table of Contents
Fetching ...

DI3CL: Contrastive Learning With Dynamic Instances and Contour Consistency for SAR Land-Cover Classification Foundation Model

Zhongle Ren, Hui Ding, Kai Wang, Biao Hou, Xingyu Luo, Weibin Li, Licheng Jiao

TL;DR

This work tackles the challenge of label-efficient SAR land-cover classification by developing a SAR foundation model trained with self-supervised contrastive learning. It introduces DI3CL, a framework that combines a Dynamic Instance module to expand local-global context and a Contour Coherence module to emphasize land-cover contours, trained on the large SARSense dataset containing 460,532 SAR patches. The approach yields a SAR land-cover foundation model that generalizes well across mapping, water detection, and road extraction, achieving state-of-the-art results and strong cross-task transfer. The work demonstrates the practical value of scalable, label-efficient SAR interpretation and points toward future modality fusion with optical data to further boost semantic understanding.

Abstract

Although significant advances have been achieved in SAR land-cover classification, recent methods remain predominantly focused on supervised learning, which relies heavily on extensive labeled datasets. This dependency not only limits scalability and generalization but also restricts adaptability to diverse application scenarios. In this paper, a general-purpose foundation model for SAR land-cover classification is developed, serving as a robust cornerstone to accelerate the development and deployment of various downstream models. Specifically, a Dynamic Instance and Contour Consistency Contrastive Learning (DI3CL) pre-training framework is presented, which incorporates a Dynamic Instance (DI) module and a Contour Consistency (CC) module. DI module enhances global contextual awareness by enforcing local consistency across different views of the same region. CC module leverages shallow feature maps to guide the model to focus on the geometric contours of SAR land-cover objects, thereby improving structural discrimination. Additionally, to enhance robustness and generalization during pre-training, a large-scale and diverse dataset named SARSense, comprising 460,532 SAR images, is constructed to enable the model to capture comprehensive and representative features. To evaluate the generalization capability of our foundation model, we conducted extensive experiments across a variety of SAR land-cover classification tasks, including SAR land-cover mapping, water body detection, and road extraction. The results consistently demonstrate that the proposed DI3CL outperforms existing methods. Our code and pre-trained weights are publicly available at: https://github.com/SARpre-train/DI3CL.

DI3CL: Contrastive Learning With Dynamic Instances and Contour Consistency for SAR Land-Cover Classification Foundation Model

TL;DR

This work tackles the challenge of label-efficient SAR land-cover classification by developing a SAR foundation model trained with self-supervised contrastive learning. It introduces DI3CL, a framework that combines a Dynamic Instance module to expand local-global context and a Contour Coherence module to emphasize land-cover contours, trained on the large SARSense dataset containing 460,532 SAR patches. The approach yields a SAR land-cover foundation model that generalizes well across mapping, water detection, and road extraction, achieving state-of-the-art results and strong cross-task transfer. The work demonstrates the practical value of scalable, label-efficient SAR interpretation and points toward future modality fusion with optical data to further boost semantic understanding.

Abstract

Although significant advances have been achieved in SAR land-cover classification, recent methods remain predominantly focused on supervised learning, which relies heavily on extensive labeled datasets. This dependency not only limits scalability and generalization but also restricts adaptability to diverse application scenarios. In this paper, a general-purpose foundation model for SAR land-cover classification is developed, serving as a robust cornerstone to accelerate the development and deployment of various downstream models. Specifically, a Dynamic Instance and Contour Consistency Contrastive Learning (DI3CL) pre-training framework is presented, which incorporates a Dynamic Instance (DI) module and a Contour Consistency (CC) module. DI module enhances global contextual awareness by enforcing local consistency across different views of the same region. CC module leverages shallow feature maps to guide the model to focus on the geometric contours of SAR land-cover objects, thereby improving structural discrimination. Additionally, to enhance robustness and generalization during pre-training, a large-scale and diverse dataset named SARSense, comprising 460,532 SAR images, is constructed to enable the model to capture comprehensive and representative features. To evaluate the generalization capability of our foundation model, we conducted extensive experiments across a variety of SAR land-cover classification tasks, including SAR land-cover mapping, water body detection, and road extraction. The results consistently demonstrate that the proposed DI3CL outperforms existing methods. Our code and pre-trained weights are publicly available at: https://github.com/SARpre-train/DI3CL.

Paper Structure

This paper contains 34 sections, 7 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Illustration of the issues with existing self-supervised contrastive learning for SAR land-cover classification tasks, along with our improvements. The visualization results are generated using the Attention-Gradient Class Activation Mapping (A-GCAM) method AGCAM. (a) and (b) show how contrastive learning for natural images enables the model to focus on foreground instances, while (c) and (d) illustrate that, when applied to SAR land-cover classification tasks, the model only focuses on the central region (e.g. the village), while ignore the other region (e.g. the forest). (e) demonstrates that, after adding the Dynamic Instances module, the model can focus on the entire image, and (f) shows that, with the further addition of the Contour Consistency module, the model not only attends to the entire image but also maintains consistent contours with the land-cover objects.
  • Figure 2: Illustrative samples from the SARSense dataset: (a) samples of village and farmland under different spatial resolutions, demonstrating the diversity of resolution; (b) samples of city areas acquired using different polarization modes, demonstrating the diversity of polarization; (c) samples of various land-cover categories, demonstrating the diversity of land-cover types.
  • Figure 3: The overall workflow of the proposed DI3CL framework, where IS denotes the intersection region shared by the two augmented views.
  • Figure 4: Visualization results of our method and compared CL methods on downstream SAR land-cover classification dataset.
  • Figure 5: Visualization results of our method and compared methods on SAR land-cover mapping dataset. (a) SAR. (b) GT. (c) DANet. (d) PSPNet. (e) DeeplabV3+. (f) UperNet. (g) FarSeg. (h) UperNet-SeCo. (i) UperNet-RSP. (j) UperNet-SMLFR. (k) UperNet-DI3CL-R50. (l) UperNet-DI3CL-R101.
  • ...and 5 more figures