UniChange: Unifying Change Detection with Multimodal Large Language Model

Xu Zhang; Danyang Li; Xiaohang Dong; Tianhao Wu; Hualong Yu; Jianye Wang; Qicheng Li; Xiang Li

UniChange: Unifying Change Detection with Multimodal Large Language Model

Xu Zhang, Danyang Li, Xiaohang Dong, Tianhao Wu, Hualong Yu, Jianye Wang, Qicheng Li, Xiang Li

TL;DR

UniChange presents a unified framework for change detection by leveraging a multimodal large language model. It unifies binary and semantic change detection through an embedding-as-change paradigm that uses three special tokens to condition pixel-level segmentation on textual prompts, enabling joint training on diverse, potentially conflicting datasets. The approach achieves state-of-the-art performance across WHU-CD, S2Looking, LEVIR-CD+, and SECOND, validating its generality and practical impact for remote sensing applications. By eliminating task-specific heads and enabling end-to-end optimization, UniChange offers a scalable path toward versatile, cross-domain change analysis in geospatial monitoring.

Abstract

Change detection (CD) is a fundamental task for monitoring and analyzing land cover dynamics. While recent high performance models and high quality datasets have significantly advanced the field, a critical limitation persists. Current models typically acquire limited knowledge from single-type annotated data and cannot concurrently leverage diverse binary change detection (BCD) and semantic change detection (SCD) datasets. This constraint leads to poor generalization and limited versatility. The recent advancements in Multimodal Large Language Models (MLLMs) introduce new possibilities for a unified CD framework. We leverage the language priors and unification capabilities of MLLMs to develop UniChange, the first MLLM-based unified change detection model. UniChange integrates generative language abilities with specialized CD functionalities. Our model successfully unifies both BCD and SCD tasks through the introduction of three special tokens: [T1], [T2], and [CHANGE]. Furthermore, UniChange utilizes text prompts to guide the identification of change categories, eliminating the reliance on predefined classification heads. This design allows UniChange to effectively acquire knowledge from multi-source datasets, even when their class definitions conflict. Experiments on four public benchmarks (WHU-CD, S2Looking, LEVIR-CD+, and SECOND) demonstrate SOTA performance, achieving IoU scores of 90.41, 53.04, 78.87, and 57.62, respectively, surpassing all previous methods. The code is available at https://github.com/Erxucomeon/UniChange.

UniChange: Unifying Change Detection with Multimodal Large Language Model

TL;DR

Abstract

UniChange: Unifying Change Detection with Multimodal Large Language Model

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)