Towards Remote Sensing Change Detection with Neural Memory
Zhenyu Yang, Gensheng Pei, Yazhou Yao, Tianfei Zhou, Lizhong Ding, Fumin Shen
TL;DR
This work tackles the challenge of remote sensing change detection by marrying a Titans-inspired neural memory backbone with segmented local attention to simultaneously capture long-range context and preserve local detail at high resolution. The proposed ChangeTitans framework comprises a memory-augmented VTitans backbone, a multi-scale VTitans-Adapter, and a two-stream TS-CBAM fusion module, followed by a convex upsampling decoder and a BCE+Dice loss to ensure accurate and coherent change maps. Across four public benchmarks, including LEVIR-CD, WHU-CD, LEVIR-CD+, SYSU-CD, and SAR-CD, ChangeTitans achieves state-of-the-art performance with competitive computational cost, exemplified by IoU of 84.36% and F1 of 91.52% on LEVIR-CD and strong results on SAR-CD. The results demonstrate the practical impact of integrating neural memory with segmented attention for robust, scalable RSCD, offering a principled path toward efficient, high-precision change mapping in diverse sensing conditions.
Abstract
Remote sensing change detection is essential for environmental monitoring, urban planning, and related applications. However, current methods often struggle to capture long-range dependencies while maintaining computational efficiency. Although Transformers can effectively model global context, their quadratic complexity poses scalability challenges, and existing linear attention approaches frequently fail to capture intricate spatiotemporal relationships. Drawing inspiration from the recent success of Titans in language tasks, we present ChangeTitans, the Titans-based framework for remote sensing change detection. Specifically, we propose VTitans, the first Titans-based vision backbone that integrates neural memory with segmented local attention, thereby capturing long-range dependencies while mitigating computational overhead. Next, we present a hierarchical VTitans-Adapter to refine multi-scale features across different network layers. Finally, we introduce TS-CBAM, a two-stream fusion module leveraging cross-temporal attention to suppress pseudo-changes and enhance detection accuracy. Experimental evaluations on four benchmark datasets (LEVIR-CD, WHU-CD, LEVIR-CD+, and SYSU-CD) demonstrate that ChangeTitans achieves state-of-the-art results, attaining \textbf{84.36\%} IoU and \textbf{91.52\%} F1-score on LEVIR-CD, while remaining computationally competitive.
