Context-self contrastive pretraining for crop type semantic segmentation
Michail Tarasiou, Riza Alp Guler, Stefanos Zafeiriou
TL;DR
The paper tackles boundary misclassification in pixel-level crop-type segmentation from Satellite Image Time Series by introducing Context-Self Contrastive Loss (CSCL), a fully supervised contrastive pre-training scheme that enforces semantic-consistent embeddings between each location and its local neighborhood. CSCL computes a local affinity within a dilated window, augmented with relative positional encodings, and optimizes a cosine-based contrastive loss over reformatted dense ground-truth labels; this pre-training proceeds without extra data and improves boundary delineation in dense segmentation. Empirically, CSCL achieves state-of-the-art results on France and Germany crop-type datasets, and the authors release the largest publicly available SITS crop-segmentation dataset with a ×4 super-resolution ground truth, enabling higher-resolution crop mapping. The findings show strong boundary gains, robust ablations, and practical benefits for high-granularity crop monitoring and policy-support applications, with broad potential for integrating CSCL into diverse dense-prediction tasks.
Abstract
In this paper, we propose a fully supervised pre-training scheme based on contrastive learning particularly tailored to dense classification tasks. The proposed Context-Self Contrastive Loss (CSCL) learns an embedding space that makes semantic boundaries pop-up by use of a similarity metric between every location in a training sample and its local context. For crop type semantic segmentation from Satellite Image Time Series (SITS) we find performance at parcel boundaries to be a critical bottleneck and explain how CSCL tackles the underlying cause of that problem, improving the state-of-the-art performance in this task. Additionally, using images from the Sentinel-2 (S2) satellite missions we compile the largest, to our knowledge, SITS dataset densely annotated by crop type and parcel identities, which we make publicly available together with the data generation pipeline. Using that data we find CSCL, even with minimal pre-training, to improve all respective baselines and present a process for semantic segmentation at super-resolution for obtaining crop classes at a more granular level. The code and instructions to download the data can be found in https://github.com/michaeltrs/DeepSatModels.
