Change-Agent: Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis
Chenyang Liu, Keyan Chen, Haotian Zhang, Zipeng Qi, Zhengxia Zou, Zhenwei Shi
TL;DR
This work introduces an interactive Change-Agent that jointly performs pixel-level change detection and semantic-level change captioning for remote sensing imagery. The core is a multi-level change interpretation (MCI) model with BI-temporal Iterative Interaction layers, trained on the LEVIR-MCI dataset, and orchestrated by a large language model (LLM) to enable user-driven analysis and tasks beyond perception. The approach achieves state-of-the-art performance on both detection and captioning metrics, demonstrates qualitative advantages in interpreting complex surface changes, and enables interactive analyses such as counting and causal reasoning. Together, the MCI+LLM framework and LEVIR-MCI data establish a new avenue for comprehensive, interactive remote sensing change interpretation and analysis.
Abstract
Monitoring changes in the Earth's surface is crucial for understanding natural processes and human impacts, necessitating precise and comprehensive interpretation methodologies. Remote sensing satellite imagery offers a unique perspective for monitoring these changes, leading to the emergence of remote sensing image change interpretation (RSICI) as a significant research focus. Current RSICI technology encompasses change detection and change captioning, each with its limitations in providing comprehensive interpretation. To address this, we propose an interactive Change-Agent, which can follow user instructions to achieve comprehensive change interpretation and insightful analysis, such as change detection and change captioning, change object counting, change cause analysis, etc. The Change-Agent integrates a multi-level change interpretation (MCI) model as the eyes and a large language model (LLM) as the brain. The MCI model contains two branches of pixel-level change detection and semantic-level change captioning, in which the BI-temporal Iterative Interaction (BI3) layer is proposed to enhance the model's discriminative feature representation capabilities. To support the training of the MCI model, we build the LEVIR-MCI dataset with a large number of change masks and captions of changes. Experiments demonstrate the SOTA performance of the MCI model in achieving both change detection and change description simultaneously, and highlight the promising application value of our Change-Agent in facilitating comprehensive interpretation of surface changes, which opens up a new avenue for intelligent remote sensing applications. To facilitate future research, we will make our dataset and codebase of the MCI model and Change-Agent publicly available at https://github.com/Chen-Yang-Liu/Change-Agent
