Improving satellite imagery segmentation using multiple Sentinel-2 revisits
Kartik Jindgar, Grace W. Lindsay
TL;DR
This study addresses how to best exploit multiple revisits in Sentinel-2 imagery when fine-tuning pre-trained remote sensing models for semantic segmentation. It systematically compares five multi-temporal input strategies across three architectures (U-Net, ViT, SWIN) and finds that latent-space fusion of revisits, particularly with a SWIN transformer backbone, yields the strongest performance gains. The approach generalizes beyond the target task, as demonstrated by consistent improvements on the PhilEO building-density dataset, underscoring its practical value for climate-relevant remote sensing applications. The results offer a robust, scalable method for leveraging temporal information in land-cover tasks and inform future hybrid fusion strategies for satellite imagery analysis.
Abstract
In recent years, analysis of remote sensing data has benefited immensely from borrowing techniques from the broader field of computer vision, such as the use of shared models pre-trained on large and diverse datasets. However, satellite imagery has unique features that are not accounted for in traditional computer vision, such as the existence of multiple revisits of the same location. Here, we explore the best way to use revisits in the framework of fine-tuning pre-trained remote sensing models. We focus on an applied research question of relevance to climate change mitigation -- power substation segmentation -- that is representative of applied uses of pre-trained models more generally. Through extensive tests of different multi-temporal input schemes across diverse model architectures, we find that fusing representations from multiple revisits in the model latent space is superior to other methods of using revisits, including as a form of data augmentation. We also find that a SWIN Transformer-based architecture performs better than U-nets and ViT-based models. We verify the generality of our results on a separate building density estimation task.
