GeoSANE: Learning Geospatial Representations from Models, Not Data

Joelle Hanna; Damian Falk; Stella X. Yu; Damian Borth

GeoSANE: Learning Geospatial Representations from Models, Not Data

Joelle Hanna, Damian Falk, Stella X. Yu, Damian Borth

Abstract

Recent advances in remote sensing have led to an increase in the number of available foundation models; each trained on different modalities, datasets, and objectives, yet capturing only part of the vast geospatial knowledge landscape. While these models show strong results within their respective domains, their capabilities remain complementary rather than unified. Therefore, instead of choosing one model over another, we aim to combine their strengths into a single shared representation. We introduce GeoSANE, a geospatial model foundry that learns a unified neural representation from the weights of existing foundation models and task-specific models, able to generate novel neural networks weights on-demand. Given a target architecture, GeoSANE generates weights ready for finetuning for classification, segmentation, and detection tasks across multiple modalities. Models generated by GeoSANE consistently outperform their counterparts trained from scratch, match or surpass state-of-the-art remote sensing foundation models, and outperform models obtained through pruning or knowledge distillation when generating lightweight networks. Evaluations across ten diverse datasets and on GEO-Bench confirm its strong generalization capabilities. By shifting from pre-training to weight generation, GeoSANE introduces a new framework for unifying and transferring geospatial knowledge across models and tasks. Code is available at \href{https://hsg-aiml.github.io/GeoSANE/}{hsg-aiml.github.io/GeoSANE/}.

GeoSANE: Learning Geospatial Representations from Models, Not Data

Abstract

Paper Structure (32 sections, 1 equation, 6 figures, 12 tables)

This paper contains 32 sections, 1 equation, 6 figures, 12 tables.

Introduction
Related Work
Remote Sensing Foundation Models.
Model Weight Generation.
Model Merging.
Method
Remote Sensing Model Collection
Model Retrieval.
Final Model Collection.
Learning the Shared Latent Representation
Tokenization of Model Weights.
Backbone and Learning Objective.
Generating new Models from the Latent Space
Downstream Tasks and Datasets
Classification
...and 17 more sections

Figures (6)

Figure 2: Overview of our approach. (A) A heterogeneous collection of models, including ViTs, Swins, ResNets, UNets, and vision-language models, is gathered from HuggingFace. (B) A weight-space autoencoder is trained to reconstruct and embed these models into a shared latent representation. (C) From this latent space, GeoSANE can generate new models on demand for specific downstream tasks such as flood segmentation, object detection, or land-cover classification.
Figure 3: Our model collection retrieved from Hugging Face is diverse as seen in the distribution of model categories in our dataset.
Figure 4: UMAPs Visualization of the latent weight space of GeoSANE, colored by architecture (left) and modality (right). GeoSANE learns a compact shared latent representation of model weights from neural network models of different architecture and modalities (see Sec. \ref{['sec:supp:umap']}, in the supplementary material for details).
Figure 5: Convergence comparison between GeoSANE-initialized models and models trained from scratch.
Figure 6: Qualitative Results of the Flood Segmentation task on the Sen1Floods11 dataset
...and 1 more figures

GeoSANE: Learning Geospatial Representations from Models, Not Data

Abstract

GeoSANE: Learning Geospatial Representations from Models, Not Data

Authors

Abstract

Table of Contents

Figures (6)