Table of Contents
Fetching ...

Reuse out-of-year data to enhance land cover mapping via feature disentanglement and contrastive learning

Cassio F. Dantas, Raffaele Gaetano, Claudia Paris, Dino Ienco

TL;DR

The paper addresses the challenge of leveraging historical ground-truth data to improve current land cover mapping under domain shifts between time-separated satellite image time series. It introduces REFeD, a pseudo-siamese network that disentangles domain-invariant from domain-specific features via contrastive learning and multi-level supervision, training on both historical and current data while deploying only the invariant branch at inference. Empirical results on Koumbia (Burkina Faso) and Centre Val de Loire (France) show REFeD outperforms baselines, including semi-supervised/domain-adaptation and domain-generalization approaches, across multiple transfer tasks and crop types. This data-centric approach demonstrates that reusing out-of-year data can significantly enhance LC mapping, reducing the need for new ground truth campaigns while improving map quality for agricultural management and environmental monitoring.

Abstract

Timely up-to-date land use/land cover (LULC) maps play a pivotal role in supporting agricultural territory management, environmental monitoring and facilitating well-informed and sustainable decision-making. Typically, when creating a land cover (LC) map, precise ground truth data is collected through time-consuming and expensive field campaigns. This data is then utilized in conjunction with satellite image time series (SITS) through advanced machine learning algorithms to get the final map. Unfortunately, each time this process is repeated (e.g., annually over a region to estimate agricultural production or potential biodiversity loss), new ground truth data must be collected, leading to the complete disregard of previously gathered reference data despite the substantial financial and time investment they have required. How to make value of historical data, from the same or similar study sites, to enhance the current LULC mapping process constitutes a significant challenge that could enable the financial and human-resource efforts invested in previous data campaigns to be valued again. Aiming to tackle this important challenge, we here propose a deep learning framework based on recent advances in domain adaptation and generalization to combine remote sensing and reference data coming from two different domains (e.g. historical data and fresh ones) to ameliorate the current LC mapping process. Our approach, namely REFeD (data Reuse with Effective Feature Disentanglement for land cover mapping), leverages a disentanglement strategy, based on contrastive learning, where invariant and specific per-domain features are derived to recover the intrinsic information related to the downstream LC mapping task and alleviate possible distribution shifts between domains. Additionally, REFeD is equipped with an effective supervision scheme where feature disentanglement is further enforced via multiple levels of supervision at different granularities. The experimental assessment over two study areas covering extremely diverse and contrasted landscapes, namely Koumbia (located in the West-Africa region, in Burkina Faso) and Centre Val de Loire (located in centre Europe, France), underlines the quality of our framework and the obtained findings demonstrate that out-of-year information coming from the same (or similar) study site, at different periods of time, can constitute a valuable additional source of information to enhance the LC mapping process.

Reuse out-of-year data to enhance land cover mapping via feature disentanglement and contrastive learning

TL;DR

The paper addresses the challenge of leveraging historical ground-truth data to improve current land cover mapping under domain shifts between time-separated satellite image time series. It introduces REFeD, a pseudo-siamese network that disentangles domain-invariant from domain-specific features via contrastive learning and multi-level supervision, training on both historical and current data while deploying only the invariant branch at inference. Empirical results on Koumbia (Burkina Faso) and Centre Val de Loire (France) show REFeD outperforms baselines, including semi-supervised/domain-adaptation and domain-generalization approaches, across multiple transfer tasks and crop types. This data-centric approach demonstrates that reusing out-of-year data can significantly enhance LC mapping, reducing the need for new ground truth campaigns while improving map quality for agricultural management and environmental monitoring.

Abstract

Timely up-to-date land use/land cover (LULC) maps play a pivotal role in supporting agricultural territory management, environmental monitoring and facilitating well-informed and sustainable decision-making. Typically, when creating a land cover (LC) map, precise ground truth data is collected through time-consuming and expensive field campaigns. This data is then utilized in conjunction with satellite image time series (SITS) through advanced machine learning algorithms to get the final map. Unfortunately, each time this process is repeated (e.g., annually over a region to estimate agricultural production or potential biodiversity loss), new ground truth data must be collected, leading to the complete disregard of previously gathered reference data despite the substantial financial and time investment they have required. How to make value of historical data, from the same or similar study sites, to enhance the current LULC mapping process constitutes a significant challenge that could enable the financial and human-resource efforts invested in previous data campaigns to be valued again. Aiming to tackle this important challenge, we here propose a deep learning framework based on recent advances in domain adaptation and generalization to combine remote sensing and reference data coming from two different domains (e.g. historical data and fresh ones) to ameliorate the current LC mapping process. Our approach, namely REFeD (data Reuse with Effective Feature Disentanglement for land cover mapping), leverages a disentanglement strategy, based on contrastive learning, where invariant and specific per-domain features are derived to recover the intrinsic information related to the downstream LC mapping task and alleviate possible distribution shifts between domains. Additionally, REFeD is equipped with an effective supervision scheme where feature disentanglement is further enforced via multiple levels of supervision at different granularities. The experimental assessment over two study areas covering extremely diverse and contrasted landscapes, namely Koumbia (located in the West-Africa region, in Burkina Faso) and Centre Val de Loire (located in centre Europe, France), underlines the quality of our framework and the obtained findings demonstrate that out-of-year information coming from the same (or similar) study site, at different periods of time, can constitute a valuable additional source of information to enhance the LC mapping process.
Paper Structure (22 sections, 4 equations, 6 figures, 8 tables)

This paper contains 22 sections, 4 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Overview of the proposed framework. Training and inference stages are distinguished: while the former is performed on data coming from both domains, the latter is done exclusively on target data and uses only the domain-invariant branch ($g_{inf}, f$) of the learned model.
  • Figure 2: Architecture of the proposed pseudo-siamese network used in the training stage and composed of two independent branches which disentangle the domain-invariant information (top branch) from domain-specific information (bottom branch). Class ($\mathcal{L}_{cl}$) and domain ($\mathcal{L}_{dom}$) discrimination losses used respectively on the top and bottom branches, while a multi-level contrastive loss ($\mathcal{L}_{con}$) is used to intermediate features at different depths from both branches. At inference time, only the domain-invariant encoder is used for classifying the target domain.
  • Figure 3: View and location of Koumbia study site. The ground truth data coming from the 2020 year is superposed to a Sentinel-2 image covering the whole area. In the red box (bottom right) a more detailed view of the study site is depicted.
  • Figure 4: View and location of the Centre Val de Loire study site. The ground truth data coming from the 2018 and 2021 year area superposed to a Sentinel-2 image. On the right, a detail for each of the areas is proposed, in red for 2018 and in blue for 2021.
  • Figure 5: Extracts from the provided land cover maps per method. Ground truth areas outlined over the extracts using the same color codes of Fig. \ref{['fig:koumbia_site']}.
  • ...and 1 more figures