Table of Contents
Fetching ...

Measuring the Intrinsic Dimension of Earth Representations

Arjun Rao, Marc Rußwurm, Konstantin Klemmer, Esther Rolf

TL;DR

This work introduces intrinsic dimension (ID) as an architecture- and task-agnostic metric to quantify the information content of geographic implicit neural representations (INRs). By separately assessing representativeness in embedding space and task-alignment in activation space, it reveals how ID relates to Downstream performance and uncovers spatial artifacts tied to pretraining data and architectures. The study shows global ID is much smaller than the ambient embedding dimension but grows with higher spatial resolution and more modalities, while local ID highlights region-specific biases. Together, these findings position ID as a practical unsupervised diagnostic tool to guide pretraining design, model selection, and evaluation in geographic INRs.

Abstract

Within the context of representation learning for Earth observation, geographic Implicit Neural Representations (INRs) embed low-dimensional location inputs (longitude, latitude) into high-dimensional embeddings, through models trained on geo-referenced satellite, image or text data. Despite the common aim of geographic INRs to distill Earth's data into compact, learning-friendly representations, we lack an understanding of how much information is contained in these Earth representations, and where that information is concentrated. The intrinsic dimension of a dataset measures the number of degrees of freedom required to capture its local variability, regardless of the ambient high-dimensional space in which it is embedded. This work provides the first study of the intrinsic dimensionality of geographic INRs. Analyzing INRs with ambient dimension between 256 and 512, we find that their intrinsic dimensions fall roughly between 2 and 10 and are sensitive to changing spatial resolution and input modalities during INR pre-training. Furthermore, we show that the intrinsic dimension of a geographic INR correlates with downstream task performance and can capture spatial artifacts, facilitating model evaluation and diagnostics. More broadly, our work offers an architecture-agnostic, label-free metric of information content that can enable unsupervised evaluation, model selection, and pre-training design across INRs.

Measuring the Intrinsic Dimension of Earth Representations

TL;DR

This work introduces intrinsic dimension (ID) as an architecture- and task-agnostic metric to quantify the information content of geographic implicit neural representations (INRs). By separately assessing representativeness in embedding space and task-alignment in activation space, it reveals how ID relates to Downstream performance and uncovers spatial artifacts tied to pretraining data and architectures. The study shows global ID is much smaller than the ambient embedding dimension but grows with higher spatial resolution and more modalities, while local ID highlights region-specific biases. Together, these findings position ID as a practical unsupervised diagnostic tool to guide pretraining design, model selection, and evaluation in geographic INRs.

Abstract

Within the context of representation learning for Earth observation, geographic Implicit Neural Representations (INRs) embed low-dimensional location inputs (longitude, latitude) into high-dimensional embeddings, through models trained on geo-referenced satellite, image or text data. Despite the common aim of geographic INRs to distill Earth's data into compact, learning-friendly representations, we lack an understanding of how much information is contained in these Earth representations, and where that information is concentrated. The intrinsic dimension of a dataset measures the number of degrees of freedom required to capture its local variability, regardless of the ambient high-dimensional space in which it is embedded. This work provides the first study of the intrinsic dimensionality of geographic INRs. Analyzing INRs with ambient dimension between 256 and 512, we find that their intrinsic dimensions fall roughly between 2 and 10 and are sensitive to changing spatial resolution and input modalities during INR pre-training. Furthermore, we show that the intrinsic dimension of a geographic INR correlates with downstream task performance and can capture spatial artifacts, facilitating model evaluation and diagnostics. More broadly, our work offers an architecture-agnostic, label-free metric of information content that can enable unsupervised evaluation, model selection, and pre-training design across INRs.

Paper Structure

This paper contains 24 sections, 8 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Estimating the intrinsic dimension (ID) of geographic implicit neural representations (INRs). We compute the ID of geographic INRs in two ways, to measure model representativeness and task-alignment. Representativeness (left): We generate location embeddings with frozen pre-trained location encoders for coordinates across Earth's land mass. We calculate the global and local ID values on the resulting embeddings. Task-alignment (right): We train a downstream task-specific model using location embeddings as input. We use a TwoNN ID estimator to measure the ID of the activations of the task-specific model's last hidden layer.
  • Figure 2: Local intrinsic dimension of geographic INRs reveal spatial artifacts. We use the MLE estimator on embeddings generated over Earth's landmass. $N=100,000$ points sampled with $k=100$ neighbors used in the MLE ID calculation. We plot the local ID of more INRs in Appendix \ref{['fig:appendix_local_id_plots']}.
  • Figure 3: Relationship between global ID of geographic INRs and downstream task performance measured across five regression and classification tasks. In both rows, the location embeddings are frozen while task-specific predictions heads (3 layer MLPs) are learned. In (a), ID (horizontal axis) is calculated on the frozen pre-trained embeddings as in \ref{['tab:global_ids']}. In (b), ID is measured in activation space using the TwoNN estimator on a learned classifier's penultimate layer.
  • Figure 4: FisherS global ID of task-specific location embeddings learned via supervised learning vs test $R^2$ of four continuous location encoders on five tasks from TorchSpatial torchspatial FisherS ID is calculated on the intermediate location embeddings from the location encoder, similar to \ref{['fig:rq2-ssl']}. The asset index, sanitation index, and women education tasks are image-location regression tasks, as detailed in \ref{['sec:exp']}.
  • Figure 5: Effect of location encoder spatial resolution on global ID. (Left) for SatCLIP, we pre-train the location encoder with $L=10,20$, and $40$ Legendre Polynomials. (Middle) For GeoCLIP, we increase both the maximum RFF frequency and the number of hierarchical levels (M) used by the location encoder by fine-tuning the new higher-frequency branches on a YFCC yfcc image geo-localization task. (Right) For Sphere2Vec (S2V) and Space2Vec (Space2V) encoders, we increase the number of frequency components (S) and train the location encoder with supervised learning on the MOSAIKS nightlights regression task.
  • ...and 6 more figures