CD-CTFM: A Lightweight CNN-Transformer Network for Remote Sensing Cloud Detection Fusing Multiscale Features
Wenxuan Ge, Xubing Yang, Li Zhang
TL;DR
Cloud-contaminated optical remote sensing images hinder information extraction, requiring reliable cloud masks. The authors introduce CD-CTFM, a lightweight encoder–decoder that fuses local and global features via a CNN–Transformer backbone, a Lightweight Feature Pyramid Module, and a Lightweight Channel-Spatial Attention module. On 38-Cloud and MODIS datasets, CD-CTFM achieves competitive accuracy with substantially fewer parameters and GFLOPS than state-of-the-art methods. The work demonstrates that careful multiscale feature fusion and efficient attention can deliver accurate cloud detection with lower computational cost, enabling faster preprocessing for large-scale remote sensing pipelines.
Abstract
Clouds in remote sensing images inevitably affect information extraction, which hinder the following analysis of satellite images. Hence, cloud detection is a necessary preprocessing procedure. However, the existing methods have numerous calculations and parameters. In this letter, a lightweight CNN-Transformer network, CD-CTFM, is proposed to solve the problem. CD-CTFM is based on encoder-decoder architecture and incorporates the attention mechanism. In the decoder part, we utilize a lightweight network combing CNN and Transformer as backbone, which is conducive to extract local and global features simultaneously. Moreover, a lightweight feature pyramid module is designed to fuse multiscale features with contextual information. In the decoder part, we integrate a lightweight channel-spatial attention module into each skip connection between encoder and decoder, extracting low-level features while suppressing irrelevant information without introducing many parameters. Finally, the proposed model is evaluated on two cloud datasets, 38-Cloud and MODIS. The results demonstrate that CD-CTFM achieves comparable accuracy as the state-of-art methods. At the same time, CD-CTFM outperforms state-of-art methods in terms of efficiency.
