A transformer boosted UNet for smoke segmentation in complex backgrounds in multispectral LandSat imagery
Jixue Liu, Jiuyong Li, Stefan Peters, Liang Zhao
TL;DR
The paper addresses pixel-level smoke segmentation in multispectral Landsat imagery, tackling challenges from variable smoke density, complex backgrounds, and thin smoke by introducing VTrUNet, which combines a virtual-channel construction module with a transformer-boosted UNet. The architecture expands a 6-band input to 64 channels and uses a ViT-inspired transformer block at each UNet level to capture long-range contextual relationships, with a final MLP mapping to three classes: Smoke, Cloud, and Clear. A moderated F1 score $F1_h$ is proposed to evaluate performance under partial labeling, accounting for unlabelled gaps and providing robust, class- and image-level averages. Experiments on Landsat data show that VTrUNet, particularly with VC and internal TrfB wiring, achieves the best performance among recent segmentation models, highlighting the value of spectral pattern learning and long-range context in challenging, partially labeled remote-sensing smoke detection scenarios.
Abstract
Many studies have been done to detect smokes from satellite imagery. However, these prior methods are not still effective in detecting various smokes in complex backgrounds. Smokes present challenges in detection due to variations in density, color, lighting, and backgrounds such as clouds, haze, and/or mist, as well as the contextual nature of thin smoke. This paper addresses these challenges by proposing a new segmentation model called VTrUNet which consists of a virtual band construction module to capture spectral patterns and a transformer boosted UNet to capture long range contextual features. The model takes imagery of six bands: red, green, blue, near infrared, and two shortwave infrared bands as input. To show the advantages of the proposed model, the paper presents extensive results for various possible model architectures improving UNet and draws interesting conclusions including that adding more modules to a model does not always lead to a better performance. The paper also compares the proposed model with very recently proposed and related models for smoke segmentation and shows that the proposed model performs the best and makes significant improvements on prediction performances
