Table of Contents
Fetching ...

Impacts of Color and Texture Distortions on Earth Observation Data in Deep Learning

Martin Willbo, Aleksis Pirinen, John Martinsson, Edvin Listo Zec, Olof Mogren, Mikael Nilsson

TL;DR

This work investigates how color and texture distortions at test time affect deep learning-based EO land-cover segmentation. It systematically evaluates three architectures on the OpenEarthMap dataset under color (gray-scale) and texture (pixel-swap) distortions, with distortions applied per class and no distortions used during training. The results show robust performance to color changes but pronounced sensitivity to texture rearrangements, and a strong dependence on surrounding context for correct predictions. These findings inform EO-specific augmentation strategies and robustness improvements, suggesting class-dependent approaches and context-aware designs to enhance generalization in Earth observation tasks.

Abstract

Land cover classification and change detection are two important applications of remote sensing and Earth observation (EO) that have benefited greatly from the advances of deep learning. Convolutional and transformer-based U-net models are the state-of-the-art architectures for these tasks, and their performances have been boosted by an increased availability of large-scale annotated EO datasets. However, the influence of different visual characteristics of the input EO data on a model's predictions is not well understood. In this work we systematically examine model sensitivities with respect to several color- and texture-based distortions on the input EO data during inference, given models that have been trained without such distortions. We conduct experiments with multiple state-of-the-art segmentation networks for land cover classification and show that they are in general more sensitive to texture than to color distortions. Beyond revealing intriguing characteristics of widely used land cover classification models, our results can also be used to guide the development of more robust models within the EO domain.

Impacts of Color and Texture Distortions on Earth Observation Data in Deep Learning

TL;DR

This work investigates how color and texture distortions at test time affect deep learning-based EO land-cover segmentation. It systematically evaluates three architectures on the OpenEarthMap dataset under color (gray-scale) and texture (pixel-swap) distortions, with distortions applied per class and no distortions used during training. The results show robust performance to color changes but pronounced sensitivity to texture rearrangements, and a strong dependence on surrounding context for correct predictions. These findings inform EO-specific augmentation strategies and robustness improvements, suggesting class-dependent approaches and context-aware designs to enhance generalization in Earth observation tasks.

Abstract

Land cover classification and change detection are two important applications of remote sensing and Earth observation (EO) that have benefited greatly from the advances of deep learning. Convolutional and transformer-based U-net models are the state-of-the-art architectures for these tasks, and their performances have been boosted by an increased availability of large-scale annotated EO datasets. However, the influence of different visual characteristics of the input EO data on a model's predictions is not well understood. In this work we systematically examine model sensitivities with respect to several color- and texture-based distortions on the input EO data during inference, given models that have been trained without such distortions. We conduct experiments with multiple state-of-the-art segmentation networks for land cover classification and show that they are in general more sensitive to texture than to color distortions. Beyond revealing intriguing characteristics of widely used land cover classification models, our results can also be used to guide the development of more robust models within the EO domain.
Paper Structure (5 sections, 17 figures)

This paper contains 5 sections, 17 figures.

Figures (17)

  • Figure 1: Example image from the training dataset (OpenEarthMap). The class considered here is tree. Yellow and dark blue respectively show pixels predicted as tree and not tree. Top row: Original image, image with gray-scale transformation (color distortion) applied, and image with pixel-swap transformation (texture distortion) applied, respectively. Note that in the middle, the trees are gray even if they appear to be in color at a glance. Bottom row: Model predictions for the corresponding images in the first row. The transformations are defined in §\ref{['sec:method']}; more transformations are explored in the appendix. Predictions made using U-Net-Efficientnet-B4.
  • Figure 2: Impact of the gray-scale (top) and pixel-swap (bottom) transformations at test time on the validation set for the three segmentation models outlined in §\ref{['sec:method']}. From left to right: U-Net-Efficientnet-B4, DeeplabV3-Resnet50, and FTUNetFormer. The solid black curve is the mean of the colored curves (validation data), and the dashed black curve is the corresponding mean on training data (included for comparison). Models are generally more sensitive to texture than color distortions. The pixel-swap plot curves are the mean over three realisations of the pixel-swap transform.
  • Figure 3: Top: Zanzibar region, pixel-swap transformation on the bare class with proportion $p$ swapped, where $p \in \{0, 0.33, 0.66, 1\}$ (left to right). Middle: Corresponding model predictions, where yellow and dark blue respectively show pixels predicted as bare and not bare. The border region remains correctly classified regardless of transformation intensity, so the surrounding context is critical. Bottom: Same as middle, but predictions obtained from the same images and distortions as in the first row, but where all pixels except bare ones have been masked out in the images by replacing them with the per-channel mean of the training set (more such results are in the appendix). The importance of context is clear. Predictions made using the U-Net-Efficientnet-B4 model.
  • Figure 4: Impact of the pixel-swap transformation where all pixels except for the class under investigation are replaced by the per-channel mean of the training set. Results are shown for the three segmentation models outlined in §\ref{['sec:method']}. From left to right: U-Net-Efficientnet-B4, DeeplabV3-Resnet50, and FTUNetFormer. The solid black curve is the mean of the colored curves (validation data), and the dashed black curve is the corresponding mean on training data (included for comparison). Models perform significantly worse in general, compared to the case where context is kept intact (see also Fig. \ref{['fig:mainresult_borderbias_zeron_gap']}).
  • Figure 5: Comparison of keeping context intact (blue) and removing context (red), with various proportions of the pixel-swap transformation. The blue curves are identical to the means (in black) of Fig. \ref{['fig:main_result_plot']}. The red curves are identical to the means (in black) of Fig. \ref{['fig:mainresult_borderbias_zeron']}. From left to right: U-Net-Efficientnet-B4, DeeplabV3-Resnet50, and FTUNetFormer. There is a significant performance drop at all proportions $p$, even when no pixel-swap is applied ($p = 0$), and the difference between the training and validation set is vastly smaller when surrounding context is removed.
  • ...and 12 more figures