Table of Contents
Fetching ...

ChangeMinds: Multi-task Framework for Detecting and Describing Changes in Remote Sensing

Yuduo Wang, Weikang Yu, Michael Kopp, Pedram Ghamisi

TL;DR

ChangeMinds is presented, a novel unified multi-task framework that concurrently optimizes CD and CC processes within a single, end-to-end model and introduces a multi-task predictor with a cross-attention mechanism that enhances the interaction between image and text features, promoting efficient simultaneous learning and processing for both tasks.

Abstract

Recent advancements in Remote Sensing (RS) for Change Detection (CD) and Change Captioning (CC) have seen substantial success by adopting deep learning techniques. Despite these advances, existing methods often handle CD and CC tasks independently, leading to inefficiencies from the absence of synergistic processing. In this paper, we present ChangeMinds, a novel unified multi-task framework that concurrently optimizes CD and CC processes within a single, end-to-end model. We propose the change-aware long short-term memory module (ChangeLSTM) to effectively capture complex spatiotemporal dynamics from extracted bi-temporal deep features, enabling the generation of universal change-aware representations that effectively serve both CC and CD tasks. Furthermore, we introduce a multi-task predictor with a cross-attention mechanism that enhances the interaction between image and text features, promoting efficient simultaneous learning and processing for both tasks. Extensive evaluations on the LEVIR-MCI dataset, alongside other standard benchmarks, show that ChangeMinds surpasses existing methods in multi-task learning settings and markedly improves performance in individual CD and CC tasks. Codes and pre-trained models will be available online.

ChangeMinds: Multi-task Framework for Detecting and Describing Changes in Remote Sensing

TL;DR

ChangeMinds is presented, a novel unified multi-task framework that concurrently optimizes CD and CC processes within a single, end-to-end model and introduces a multi-task predictor with a cross-attention mechanism that enhances the interaction between image and text features, promoting efficient simultaneous learning and processing for both tasks.

Abstract

Recent advancements in Remote Sensing (RS) for Change Detection (CD) and Change Captioning (CC) have seen substantial success by adopting deep learning techniques. Despite these advances, existing methods often handle CD and CC tasks independently, leading to inefficiencies from the absence of synergistic processing. In this paper, we present ChangeMinds, a novel unified multi-task framework that concurrently optimizes CD and CC processes within a single, end-to-end model. We propose the change-aware long short-term memory module (ChangeLSTM) to effectively capture complex spatiotemporal dynamics from extracted bi-temporal deep features, enabling the generation of universal change-aware representations that effectively serve both CC and CD tasks. Furthermore, we introduce a multi-task predictor with a cross-attention mechanism that enhances the interaction between image and text features, promoting efficient simultaneous learning and processing for both tasks. Extensive evaluations on the LEVIR-MCI dataset, alongside other standard benchmarks, show that ChangeMinds surpasses existing methods in multi-task learning settings and markedly improves performance in individual CD and CC tasks. Codes and pre-trained models will be available online.

Paper Structure

This paper contains 30 sections, 15 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Multi-task learning framework overview for detecting and describing changes from bi-temporal RS images.
  • Figure 2: Common Model Architectures for CD (a) and CC (b) tasks.
  • Figure 3: The overall structure of our proposed ChangeMinds for detecting and Describing Changes. (a) Transformer-based Siamese Encoder. (b) Change-aware LSTM (ChangeLSTM). (c) Multi-task Predictor. (d) CD Classifier and (e) CC Classifier are within the Multi-task Predictor.
  • Figure 4: Illustration of the proposed ChangeLSTM module.
  • Figure 5: Qualitative comparisons of ChangeMinds and MCINet liu2024change on the LEVIR-MCI dataset. The images from left to right are as follows: the pre-change image, the post-change image, the CD ground truth label, the change maps predicted by MCINet, the change maps predicted by ChangeMinds, and a comparison of the captions generated by MCINet and our method, with GT representing the annotated ground truth captions.
  • ...and 3 more figures