Table of Contents
Fetching ...

Beyond Quadratic: Linear-Time Change Detection with RWKV

Zhenyu Yang, Gensheng Pei, Tao Chen, Xia Yuan, Haofeng Zhang, Xiangbo Shu, Yazhou Yao

Abstract

Existing paradigms for remote sensing change detection are caught in a trade-off: CNNs excel at efficiency but lack global context, while Transformers capture long-range dependencies at a prohibitive computational cost. This paper introduces ChangeRWKV, a new architecture that reconciles this conflict. By building upon the Receptance Weighted Key Value (RWKV) framework, our ChangeRWKV uniquely combines the parallelizable training of Transformers with the linear-time inference of RNNs. Our approach core features two key innovations: a hierarchical RWKV encoder that builds multi-resolution feature representation, and a novel Spatial-Temporal Fusion Module (STFM) engineered to resolve spatial misalignments across scales while distilling fine-grained temporal discrepancies. ChangeRWKV not only achieves state-of-the-art performance on the LEVIR-CD benchmark, with an 85.46% IoU and 92.16% F1 score, but does so while drastically reducing parameters and FLOPs compared to previous leading methods. This work demonstrates a new, efficient, and powerful paradigm for operational-scale change detection. Our code and model are publicly available.

Beyond Quadratic: Linear-Time Change Detection with RWKV

Abstract

Existing paradigms for remote sensing change detection are caught in a trade-off: CNNs excel at efficiency but lack global context, while Transformers capture long-range dependencies at a prohibitive computational cost. This paper introduces ChangeRWKV, a new architecture that reconciles this conflict. By building upon the Receptance Weighted Key Value (RWKV) framework, our ChangeRWKV uniquely combines the parallelizable training of Transformers with the linear-time inference of RNNs. Our approach core features two key innovations: a hierarchical RWKV encoder that builds multi-resolution feature representation, and a novel Spatial-Temporal Fusion Module (STFM) engineered to resolve spatial misalignments across scales while distilling fine-grained temporal discrepancies. ChangeRWKV not only achieves state-of-the-art performance on the LEVIR-CD benchmark, with an 85.46% IoU and 92.16% F1 score, but does so while drastically reducing parameters and FLOPs compared to previous leading methods. This work demonstrates a new, efficient, and powerful paradigm for operational-scale change detection. Our code and model are publicly available.
Paper Structure (15 sections, 9 equations, 5 figures, 4 tables)

This paper contains 15 sections, 9 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Efficiency versus accuracy on the LEVIR-CD benchmark. The proposed ChangeRWKV family establishes a new state-of-the-art frontier, delivering superior IoU scores while demanding significantly fewer computational resources (FLOPs) and parameters than existing methods. Our tiny model, for instance, achieves a competitive 84.92% IoU with only 4.7M Params and 9.40G FLOPs.
  • Figure 2: Overall architecture of ChangeRWKV. The model consists of three main components: (a) a Hierarchical RWKV Encoder that extracts multi-scale features, (b) a Spatial-Temporal Fusion Module (STFM) that integrates spatial and temporal cues, and (c) a lightweight Decoder that generates the change mask. The STFM is further decomposed into (b1) a Spatial Fusion Module (SFM) for multi-scale spatial alignment and (b2) a Temporal Fusion Module (TFM) for bi-temporal interaction.
  • Figure 3: Qualitative results on the LEVIR-CD test set. Predicted outputs are color-coded as follows: white for true positives (TP), black for true negatives (TN), green for false positives (FP), and red for false negatives (FN).
  • Figure 4: Qualitative results on the SAR-CD test set. Predicted outputs are color-coded as follows: white for true positives (TP), black for true negatives (TN), green for false positives (FP), and red for false negatives (FN).
  • Figure 5: Computational Scalability Analysis. ChangeRWKV demonstrates near-linear growth in (a) FLOPs, (b) training memory, and (c) inference memory, significantly outperforming Transformer-based models at high resolutions.