Table of Contents
Fetching ...

Segregation and Context Aggregation Network for Real-time Cloud Segmentation

Yijie Li, Hewei Wang, Jiayi Zhang, Jinjiang You, Jinfeng Xu, Puzhen Wu, Yunzhong Xiao, Soumyabrata Dev

TL;DR

SCAnet tackles real-time ground-based sky/cloud segmentation by coupling a lightweight CNN with the Segregation and Context Aggregation Module (SCAM), which refines rough sky/cloud maps through segregated processing of foreground and background features. The method achieves state-of-the-art accuracy while drastically reducing parameters (SCANet-large: 4.29M; SCANet-lite: 0.09M) and enabling edge-friendly, real-time inference, including 1390 FPS in FP16 for the lightweight variant. A novel SCAM decoder with stage-wise supervision and a BCE+IoU loss formulation underpins performance gains, while an efficient SWINySEG-based pre-training strategy reduces reliance on ImageNet pre-training. Evaluations on the SWINySEG dataset demonstrate robust daytime and nighttime performance, with substantial parameter savings and fast inference that enable practical deployment for climate monitoring and weather-related applications.

Abstract

Cloud segmentation from intensity images is a pivotal task in atmospheric science and computer vision, aiding weather forecasting and climate analysis. Ground-based sky/cloud segmentation extracts clouds from images for further feature analysis. Existing methods struggle to balance segmentation accuracy and computational efficiency, limiting real-world deployment on edge devices, so we introduce SCANet, a novel lightweight cloud segmentation model featuring Segregation and Context Aggregation Module (SCAM), which refines rough segmentation maps into weighted sky and cloud features processed separately. SCANet achieves state-of-the-art performance while drastically reducing computational complexity. SCANet-large (4.29M) achieves comparable accuracy to state-of-the-art methods with 70.9% fewer parameters. Meanwhile, SCANet-lite (90K) delivers 1390 fps in FP16, surpassing real-time standards. Additionally, we propose an efficient pre-training strategy that enhances performance even without ImageNet pre-training.

Segregation and Context Aggregation Network for Real-time Cloud Segmentation

TL;DR

SCAnet tackles real-time ground-based sky/cloud segmentation by coupling a lightweight CNN with the Segregation and Context Aggregation Module (SCAM), which refines rough sky/cloud maps through segregated processing of foreground and background features. The method achieves state-of-the-art accuracy while drastically reducing parameters (SCANet-large: 4.29M; SCANet-lite: 0.09M) and enabling edge-friendly, real-time inference, including 1390 FPS in FP16 for the lightweight variant. A novel SCAM decoder with stage-wise supervision and a BCE+IoU loss formulation underpins performance gains, while an efficient SWINySEG-based pre-training strategy reduces reliance on ImageNet pre-training. Evaluations on the SWINySEG dataset demonstrate robust daytime and nighttime performance, with substantial parameter savings and fast inference that enable practical deployment for climate monitoring and weather-related applications.

Abstract

Cloud segmentation from intensity images is a pivotal task in atmospheric science and computer vision, aiding weather forecasting and climate analysis. Ground-based sky/cloud segmentation extracts clouds from images for further feature analysis. Existing methods struggle to balance segmentation accuracy and computational efficiency, limiting real-world deployment on edge devices, so we introduce SCANet, a novel lightweight cloud segmentation model featuring Segregation and Context Aggregation Module (SCAM), which refines rough segmentation maps into weighted sky and cloud features processed separately. SCANet achieves state-of-the-art performance while drastically reducing computational complexity. SCANet-large (4.29M) achieves comparable accuracy to state-of-the-art methods with 70.9% fewer parameters. Meanwhile, SCANet-lite (90K) delivers 1390 fps in FP16, surpassing real-time standards. Additionally, we propose an efficient pre-training strategy that enhances performance even without ImageNet pre-training.

Paper Structure

This paper contains 19 sections, 4 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: #Params vs. SWINySEG Accuracy. Our proposed SCANet model successfully achieves a balance between the model size and accuracy. SCANet-large can achieve $97.0\%$ of accuracy in SWINySEG with 4.29 million parameters, while SCANet-lite can achieve $94.4\%$ of accuracy with only 90k parameters.
  • Figure 2: The overall architecture of SCANet and SCAM. (a) presents the pipeline of our SCANet; (b) details the design of our proposed SCAM module; (c) depicts the decoder structure preceding the SCAM modules. Additionally, The architecture of inverted residual block sandler2018mobilenetv2 is demonstrated in Fig. \ref{['fig:blocks_compare']} in Appendix \ref{['sec:scanet_basic']}. The up-sample block consists of an inverted residual block paired with a bilinear up-sample layer.
  • Figure 3: Qualitative comparison of SCANet-large with state-of-the-art approaches on day-time (rows 1–2) and night-time (rows 3–4) images from the SWINySEG dataset.
  • Figure 4: Schematic diagram of SWINySEG-based pre-training. (a) illustrates the positive and negative sample generation process. (b) indicates the negative samples. (c) is positive samples. (d) represents the modules involved in pre-training.
  • Figure 5: Visualization of SCAM output $s_{i}$ (first row) and background mask $m_{i}$ (second row)
  • ...and 4 more figures