Table of Contents
Fetching ...

ChangeChat: An Interactive Model for Remote Sensing Change Analysis via Multimodal Instruction Tuning

Pei Deng, Wenqian Zhou, Hanlin Wu

TL;DR

This work introduces ChangeChat, the first bitemporal vision-language model (VLM) specifically designed for interactive RS change analysis, which achieves performance comparable to or surpassing state-of-the-art (SOTA) methods on specific tasks, while significantly outperforming the latest general-domain model, GPT-4.

Abstract

Remote sensing (RS) change analysis is vital for monitoring Earth's dynamic processes by detecting alterations in images over time. Traditional change detection excels at identifying pixel-level changes but lacks the ability to contextualize these alterations. While recent advancements in change captioning offer natural language descriptions of changes, they do not support interactive, user-specific queries. To address these limitations, we introduce ChangeChat, the first bitemporal vision-language model (VLM) designed specifically for RS change analysis. ChangeChat utilizes multimodal instruction tuning, allowing it to handle complex queries such as change captioning, category-specific quantification, and change localization. To enhance the model's performance, we developed the ChangeChat-87k dataset, which was generated using a combination of rule-based methods and GPT-assisted techniques. Experiments show that ChangeChat offers a comprehensive, interactive solution for RS change analysis, achieving performance comparable to or even better than state-of-the-art (SOTA) methods on specific tasks, and significantly surpassing the latest general-domain model, GPT-4. Code and pre-trained weights are available at https://github.com/hanlinwu/ChangeChat.

ChangeChat: An Interactive Model for Remote Sensing Change Analysis via Multimodal Instruction Tuning

TL;DR

This work introduces ChangeChat, the first bitemporal vision-language model (VLM) specifically designed for interactive RS change analysis, which achieves performance comparable to or surpassing state-of-the-art (SOTA) methods on specific tasks, while significantly outperforming the latest general-domain model, GPT-4.

Abstract

Remote sensing (RS) change analysis is vital for monitoring Earth's dynamic processes by detecting alterations in images over time. Traditional change detection excels at identifying pixel-level changes but lacks the ability to contextualize these alterations. While recent advancements in change captioning offer natural language descriptions of changes, they do not support interactive, user-specific queries. To address these limitations, we introduce ChangeChat, the first bitemporal vision-language model (VLM) designed specifically for RS change analysis. ChangeChat utilizes multimodal instruction tuning, allowing it to handle complex queries such as change captioning, category-specific quantification, and change localization. To enhance the model's performance, we developed the ChangeChat-87k dataset, which was generated using a combination of rule-based methods and GPT-assisted techniques. Experiments show that ChangeChat offers a comprehensive, interactive solution for RS change analysis, achieving performance comparable to or even better than state-of-the-art (SOTA) methods on specific tasks, and significantly surpassing the latest general-domain model, GPT-4. Code and pre-trained weights are available at https://github.com/hanlinwu/ChangeChat.
Paper Structure (12 sections, 2 figures, 4 tables)

This paper contains 12 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Overview of the proposed ChangeChat. The left side illustrates the network architecture, while the right side shows examples of various types of change analysis.
  • Figure 2: Two examples of change localization are provided, with the generated coordinates visualized.