Table of Contents
Fetching ...

LEPA: Learning Geometric Equivariance in Satellite Remote Sensing Data with a Predictive Architecture

Erik Scheurer, Rocco Sedona, Stefan Kesselheim, Gabriele Cavallaro

TL;DR

A Learned Equivariance-Predicting Architecture (LEPA) is proposed, which conditions a predictor on geometric augmentations to directly predict the transformed embedding, enabling accurate geometric adjustment without re-encoding.

Abstract

Geospatial foundation models provide precomputed embeddings that serve as compact feature vectors for large-scale satellite remote sensing data. While these embeddings can reduce data-transfer bottlenecks and computational costs, Earth observation (EO) applications can still face geometric mismatches between user-defined areas of interest and the fixed precomputed embedding grid. Standard latent-space interpolation is unreliable in this setting because the embedding manifold is highly non-convex, yielding representations that do not correspond to realistic inputs. We verify this using Prithvi-EO-2.0 to understand the shortcomings of interpolation applied to patch embeddings. As a substitute, we propose a Learned Equivariance-Predicting Architecture (LEPA). Instead of averaging vectors, LEPA conditions a predictor on geometric augmentations to directly predict the transformed embedding. We evaluate LEPA on NASA/USGS Harmonized Landsat-Sentinel (HLS) imagery and ImageNet-1k. Experiments show that standard interpolation achieves a mean reciprocal rank (MRR) below 0.2, whereas LEPA increases MRR to over 0.8, enabling accurate geometric adjustment without re-encoding.

LEPA: Learning Geometric Equivariance in Satellite Remote Sensing Data with a Predictive Architecture

TL;DR

A Learned Equivariance-Predicting Architecture (LEPA) is proposed, which conditions a predictor on geometric augmentations to directly predict the transformed embedding, enabling accurate geometric adjustment without re-encoding.

Abstract

Geospatial foundation models provide precomputed embeddings that serve as compact feature vectors for large-scale satellite remote sensing data. While these embeddings can reduce data-transfer bottlenecks and computational costs, Earth observation (EO) applications can still face geometric mismatches between user-defined areas of interest and the fixed precomputed embedding grid. Standard latent-space interpolation is unreliable in this setting because the embedding manifold is highly non-convex, yielding representations that do not correspond to realistic inputs. We verify this using Prithvi-EO-2.0 to understand the shortcomings of interpolation applied to patch embeddings. As a substitute, we propose a Learned Equivariance-Predicting Architecture (LEPA). Instead of averaging vectors, LEPA conditions a predictor on geometric augmentations to directly predict the transformed embedding. We evaluate LEPA on NASA/USGS Harmonized Landsat-Sentinel (HLS) imagery and ImageNet-1k. Experiments show that standard interpolation achieves a mean reciprocal rank (MRR) below 0.2, whereas LEPA increases MRR to over 0.8, enabling accurate geometric adjustment without re-encoding.
Paper Structure (17 sections, 2 equations, 7 figures, 2 tables)

This paper contains 17 sections, 2 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: LEPA training architecture: the original input is passed to the student encoder which produces patch embeddings. These patch embeddings form the context to the predictor along with transformation parameters. Predicted embeddings are compared against the teacher output which are computed from a transformed input image. Masking is omitted for visual clarity.
  • Figure 2: Input and reconstructions of rotated and downsampled images using a finetuned version of Prithvi prithvi2. The bottom left images depict reconstructions from a transformed latent, the bottom right images are reconstructions from rotations in image space as a baseline.
  • Figure 3: Box-plot of the normalized score based on the results of Table \ref{['tab:mrr']}.
  • Figure 4: Example images with class-specific noise. Bottom rows show the first two PCA components (per image) mapped to a color wheel. In the ImageNet model without a CLS-token, background embeddings resemble the subject, a pattern not seen consistently in HLS data.
  • Figure 5: Angle sweep of a selected image comparing the first two PCA components using interpolation in image space (Targets), the predictions using finetuned LEPA, and nearest neighbor interpolation.
  • ...and 2 more figures