Table of Contents
Fetching ...

A New Learning Paradigm for Foundation Model-based Remote Sensing Change Detection

Kaiyu Li, Xiangyong Cao, Deyu Meng

TL;DR

Remote sensing change detection (CD) is often limited by scarce labeled data. The authors introduce a universal Bi-Temporal Adapter Network (BAN) that freezes a foundation model, embeds general features via a Bi-TAB, and uses bridging modules with ARIS to adapt to CD backbones, achieving effective knowledge transfer with few parameters. BAN delivers consistent gains across BCD, SCD, cross-domain, and semi-supervised tasks, validating foundation-model-based CD and enabling broader adaptability to RS data. The approach provides a scalable, extensible framework that can leverage stronger foundation models and extend to multispectral domains, reducing data requirements for high-performance CD.

Abstract

Change detection (CD) is a critical task to observe and analyze dynamic processes of land cover. Although numerous deep learning-based CD models have performed excellently, their further performance improvements are constrained by the limited knowledge extracted from the given labelled data. On the other hand, the foundation models that emerged recently contain a huge amount of knowledge by scaling up across data modalities and proxy tasks. In this paper, we propose a Bi-Temporal Adapter Network (BAN), which is a universal foundation model-based CD adaptation framework aiming to extract the knowledge of foundation models for CD. The proposed BAN contains three parts, i.e. frozen foundation model (e.g., CLIP), bi-temporal adapter branch (Bi-TAB), and bridging modules between them. Specifically, BAN extracts general features through a frozen foundation model, which are then selected, aligned, and injected into Bi-TAB via the bridging modules. Bi-TAB is designed as a model-agnostic concept to extract task/domain-specific features, which can be either an existing arbitrary CD model or some hand-crafted stacked blocks. Beyond current customized models, BAN is the first extensive attempt to adapt the foundation model to the CD task. Experimental results show the effectiveness of our BAN in improving the performance of existing CD methods (e.g., up to 4.08\% IoU improvement) with only a few additional learnable parameters. More importantly, these successful practices show us the potential of foundation models for remote sensing CD. The code is available at \url{https://github.com/likyoo/BAN} and will be supported in our Open-CD.

A New Learning Paradigm for Foundation Model-based Remote Sensing Change Detection

TL;DR

Remote sensing change detection (CD) is often limited by scarce labeled data. The authors introduce a universal Bi-Temporal Adapter Network (BAN) that freezes a foundation model, embeds general features via a Bi-TAB, and uses bridging modules with ARIS to adapt to CD backbones, achieving effective knowledge transfer with few parameters. BAN delivers consistent gains across BCD, SCD, cross-domain, and semi-supervised tasks, validating foundation-model-based CD and enabling broader adaptability to RS data. The approach provides a scalable, extensible framework that can leverage stronger foundation models and extend to multispectral domains, reducing data requirements for high-performance CD.

Abstract

Change detection (CD) is a critical task to observe and analyze dynamic processes of land cover. Although numerous deep learning-based CD models have performed excellently, their further performance improvements are constrained by the limited knowledge extracted from the given labelled data. On the other hand, the foundation models that emerged recently contain a huge amount of knowledge by scaling up across data modalities and proxy tasks. In this paper, we propose a Bi-Temporal Adapter Network (BAN), which is a universal foundation model-based CD adaptation framework aiming to extract the knowledge of foundation models for CD. The proposed BAN contains three parts, i.e. frozen foundation model (e.g., CLIP), bi-temporal adapter branch (Bi-TAB), and bridging modules between them. Specifically, BAN extracts general features through a frozen foundation model, which are then selected, aligned, and injected into Bi-TAB via the bridging modules. Bi-TAB is designed as a model-agnostic concept to extract task/domain-specific features, which can be either an existing arbitrary CD model or some hand-crafted stacked blocks. Beyond current customized models, BAN is the first extensive attempt to adapt the foundation model to the CD task. Experimental results show the effectiveness of our BAN in improving the performance of existing CD methods (e.g., up to 4.08\% IoU improvement) with only a few additional learnable parameters. More importantly, these successful practices show us the potential of foundation models for remote sensing CD. The code is available at \url{https://github.com/likyoo/BAN} and will be supported in our Open-CD.
Paper Structure (23 sections, 13 equations, 5 figures, 9 tables, 1 algorithm)

This paper contains 23 sections, 13 equations, 5 figures, 9 tables, 1 algorithm.

Figures (5)

  • Figure 1: Schematic diagram of BAN. The frozen pre-trained model can be any foundation model (ImageNet-21k pre-trained models steiner2021train, CLIP radford2021learningcherti2023reproducible, RemoteCLIP liu2023remoteclip, etc.), the Bi-TAB can be any CD model (BiT chen2021remote, ChangeFormer bandara2022transformer, etc.), and the bridging modules can inject the knowledge extracted from the foundation model into the Bi-TAB.
  • Figure 2: The detailed illustration of BAN. The foundation model (blue area) accumulates general knowledge through pre-training techniques, and the bridging modules (orange area) select, align and inject this knowledge into the Bi-TAB. The Bi-TAB (green area) is a model-agnostic concept, which can be an arbitrary customized CD model or even some hand-crafted stacked blocks. These three major components are detailed in Section \ref{['section:FM']}, \ref{['section:Bi-TAB']} and \ref{['section:BM']}, respectively.
  • Figure 3: Illustration of the Bi-TAB perspective in BAN, with BiT chen2021remote and ChangeFormer bandara2022transformer as examples. For better presentation, the color renderings follow their original literature chen2021remotebandara2022transformer.
  • Figure 4: Visualization results of different methods on the LEVIR-CD testing set. The rendered colors represent true positives (TP), false positives (FP), false negatives (FN) and true negatives (FP).
  • Figure 5: Visualization results (semantic segmentation auxiliary task) of different methods on the BANDON dataset.