Table of Contents
Fetching ...

Change-Agent: Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis

Chenyang Liu, Keyan Chen, Haotian Zhang, Zipeng Qi, Zhengxia Zou, Zhenwei Shi

TL;DR

This work introduces an interactive Change-Agent that jointly performs pixel-level change detection and semantic-level change captioning for remote sensing imagery. The core is a multi-level change interpretation (MCI) model with BI-temporal Iterative Interaction layers, trained on the LEVIR-MCI dataset, and orchestrated by a large language model (LLM) to enable user-driven analysis and tasks beyond perception. The approach achieves state-of-the-art performance on both detection and captioning metrics, demonstrates qualitative advantages in interpreting complex surface changes, and enables interactive analyses such as counting and causal reasoning. Together, the MCI+LLM framework and LEVIR-MCI data establish a new avenue for comprehensive, interactive remote sensing change interpretation and analysis.

Abstract

Monitoring changes in the Earth's surface is crucial for understanding natural processes and human impacts, necessitating precise and comprehensive interpretation methodologies. Remote sensing satellite imagery offers a unique perspective for monitoring these changes, leading to the emergence of remote sensing image change interpretation (RSICI) as a significant research focus. Current RSICI technology encompasses change detection and change captioning, each with its limitations in providing comprehensive interpretation. To address this, we propose an interactive Change-Agent, which can follow user instructions to achieve comprehensive change interpretation and insightful analysis, such as change detection and change captioning, change object counting, change cause analysis, etc. The Change-Agent integrates a multi-level change interpretation (MCI) model as the eyes and a large language model (LLM) as the brain. The MCI model contains two branches of pixel-level change detection and semantic-level change captioning, in which the BI-temporal Iterative Interaction (BI3) layer is proposed to enhance the model's discriminative feature representation capabilities. To support the training of the MCI model, we build the LEVIR-MCI dataset with a large number of change masks and captions of changes. Experiments demonstrate the SOTA performance of the MCI model in achieving both change detection and change description simultaneously, and highlight the promising application value of our Change-Agent in facilitating comprehensive interpretation of surface changes, which opens up a new avenue for intelligent remote sensing applications. To facilitate future research, we will make our dataset and codebase of the MCI model and Change-Agent publicly available at https://github.com/Chen-Yang-Liu/Change-Agent

Change-Agent: Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis

TL;DR

This work introduces an interactive Change-Agent that jointly performs pixel-level change detection and semantic-level change captioning for remote sensing imagery. The core is a multi-level change interpretation (MCI) model with BI-temporal Iterative Interaction layers, trained on the LEVIR-MCI dataset, and orchestrated by a large language model (LLM) to enable user-driven analysis and tasks beyond perception. The approach achieves state-of-the-art performance on both detection and captioning metrics, demonstrates qualitative advantages in interpreting complex surface changes, and enables interactive analyses such as counting and causal reasoning. Together, the MCI+LLM framework and LEVIR-MCI data establish a new avenue for comprehensive, interactive remote sensing change interpretation and analysis.

Abstract

Monitoring changes in the Earth's surface is crucial for understanding natural processes and human impacts, necessitating precise and comprehensive interpretation methodologies. Remote sensing satellite imagery offers a unique perspective for monitoring these changes, leading to the emergence of remote sensing image change interpretation (RSICI) as a significant research focus. Current RSICI technology encompasses change detection and change captioning, each with its limitations in providing comprehensive interpretation. To address this, we propose an interactive Change-Agent, which can follow user instructions to achieve comprehensive change interpretation and insightful analysis, such as change detection and change captioning, change object counting, change cause analysis, etc. The Change-Agent integrates a multi-level change interpretation (MCI) model as the eyes and a large language model (LLM) as the brain. The MCI model contains two branches of pixel-level change detection and semantic-level change captioning, in which the BI-temporal Iterative Interaction (BI3) layer is proposed to enhance the model's discriminative feature representation capabilities. To support the training of the MCI model, we build the LEVIR-MCI dataset with a large number of change masks and captions of changes. Experiments demonstrate the SOTA performance of the MCI model in achieving both change detection and change description simultaneously, and highlight the promising application value of our Change-Agent in facilitating comprehensive interpretation of surface changes, which opens up a new avenue for intelligent remote sensing applications. To facilitate future research, we will make our dataset and codebase of the MCI model and Change-Agent publicly available at https://github.com/Chen-Yang-Liu/Change-Agent
Paper Structure (24 sections, 8 equations, 14 figures, 6 tables)

This paper contains 24 sections, 8 equations, 14 figures, 6 tables.

Figures (14)

  • Figure 1: The comparison between previous single technology and our Change-Agent. Our Change-Agent can simultaneously achieve precise pixel-level change detection and semantic-level change captioning. Besides, it boasts interactive capabilities, enabling users to communicate their queries on surface changes.
  • Figure 2: The comparison of the proposed LEVIR-MCI dataset and the previous LEVIR-CC dataset. The LEVIR-MCI dataset is an extension of LEVIR-CC.
  • Figure 3: Examples of the LEVIR-MCI dataset. Each pair of bi-temporal images is provided with a change detection mask and one of the five sentences describing changes. In the change detection mask, changed buildings are highlighted in red, while changed roads are depicted in yellow.
  • Figure 4: Distribution of the scale and deformation of changed roads and buildings. The "area" on the horizontal axis represents the area of a single object, and the "bbox_area" on the vertical axis represents the area of the corresponding rectangular bounding box. The value of the color bins represents the number of object instances. The dispersion of points offers insights into the diversity of object scale and deformation.
  • Figure 5: The overview of Change-Agent is shown in (a). The Change-Agent is equipped with an MCI model and an LLM, serving as its eyes and brain, respectively. The proposed LEVIR-MCI dataset provides a data foundation for training the MCI model. (b) shows the overall structure of the MCI model.
  • ...and 9 more figures