Segregation and Context Aggregation Network for Real-time Cloud Segmentation
Yijie Li, Hewei Wang, Jiayi Zhang, Jinjiang You, Jinfeng Xu, Puzhen Wu, Yunzhong Xiao, Soumyabrata Dev
TL;DR
SCAnet tackles real-time ground-based sky/cloud segmentation by coupling a lightweight CNN with the Segregation and Context Aggregation Module (SCAM), which refines rough sky/cloud maps through segregated processing of foreground and background features. The method achieves state-of-the-art accuracy while drastically reducing parameters (SCANet-large: 4.29M; SCANet-lite: 0.09M) and enabling edge-friendly, real-time inference, including 1390 FPS in FP16 for the lightweight variant. A novel SCAM decoder with stage-wise supervision and a BCE+IoU loss formulation underpins performance gains, while an efficient SWINySEG-based pre-training strategy reduces reliance on ImageNet pre-training. Evaluations on the SWINySEG dataset demonstrate robust daytime and nighttime performance, with substantial parameter savings and fast inference that enable practical deployment for climate monitoring and weather-related applications.
Abstract
Cloud segmentation from intensity images is a pivotal task in atmospheric science and computer vision, aiding weather forecasting and climate analysis. Ground-based sky/cloud segmentation extracts clouds from images for further feature analysis. Existing methods struggle to balance segmentation accuracy and computational efficiency, limiting real-world deployment on edge devices, so we introduce SCANet, a novel lightweight cloud segmentation model featuring Segregation and Context Aggregation Module (SCAM), which refines rough segmentation maps into weighted sky and cloud features processed separately. SCANet achieves state-of-the-art performance while drastically reducing computational complexity. SCANet-large (4.29M) achieves comparable accuracy to state-of-the-art methods with 70.9% fewer parameters. Meanwhile, SCANet-lite (90K) delivers 1390 fps in FP16, surpassing real-time standards. Additionally, we propose an efficient pre-training strategy that enhances performance even without ImageNet pre-training.
