Table of Contents
Fetching ...

Climplicit: Climatic Implicit Embeddings for Global Ecological Tasks

Johannes Dollinger, Damien Robert, Elena Plekhanova, Lukas Drees, Jan Dirk Wegner

TL;DR

Climplicit presents a high-resolution, lightweight spatio-temporal climate encoder pretrained on CHELSA climatologies to generate implicit climate representations for any Earth location and month. By introducing ReSIREN and a direct temporal embedding, the model achieves strong downstream performance across biome classification, species distribution modeling, and plant trait regression while drastically reducing storage needs. Ablation studies confirm the importance of residual connections, temporal encoding, and CHELSA-based pretraining, and results indicate competitive advantages over existing geolocation encoders. The approach enables easier, more scalable ecological learning with reduced carbon footprint, albeit with some limitations related to implicit representations and resolution relative to full climate rasters.

Abstract

Deep learning on climatic data holds potential for macroecological applications. However, its adoption remains limited among scientists outside the deep learning community due to storage, compute, and technical expertise barriers. To address this, we introduce Climplicit, a spatio-temporal geolocation encoder pretrained to generate implicit climatic representations anywhere on Earth. By bypassing the need to download raw climatic rasters and train feature extractors, our model uses x3500 less disk space and significantly reduces computational needs for downstream tasks. We evaluate our Climplicit embeddings on biomes classification, species distribution modeling, and plant trait regression. We find that single-layer probing our Climplicit embeddings consistently performs better or on par with training a model from scratch on downstream tasks and overall better than alternative geolocation encoding models.

Climplicit: Climatic Implicit Embeddings for Global Ecological Tasks

TL;DR

Climplicit presents a high-resolution, lightweight spatio-temporal climate encoder pretrained on CHELSA climatologies to generate implicit climate representations for any Earth location and month. By introducing ReSIREN and a direct temporal embedding, the model achieves strong downstream performance across biome classification, species distribution modeling, and plant trait regression while drastically reducing storage needs. Ablation studies confirm the importance of residual connections, temporal encoding, and CHELSA-based pretraining, and results indicate competitive advantages over existing geolocation encoders. The approach enables easier, more scalable ecological learning with reduced carbon footprint, albeit with some limitations related to implicit representations and resolution relative to full climate rasters.

Abstract

Deep learning on climatic data holds potential for macroecological applications. However, its adoption remains limited among scientists outside the deep learning community due to storage, compute, and technical expertise barriers. To address this, we introduce Climplicit, a spatio-temporal geolocation encoder pretrained to generate implicit climatic representations anywhere on Earth. By bypassing the need to download raw climatic rasters and train feature extractors, our model uses x3500 less disk space and significantly reduces computational needs for downstream tasks. We evaluate our Climplicit embeddings on biomes classification, species distribution modeling, and plant trait regression. We find that single-layer probing our Climplicit embeddings consistently performs better or on par with training a model from scratch on downstream tasks and overall better than alternative geolocation encoding models.

Paper Structure

This paper contains 25 sections, 1 theorem, 4 equations, 8 figures, 4 tables.

Key Result

Lemma B.1

Let $X$,$Y \sim \mathcal{N}(0,1)$ be independent and $Z = (X+Y)/2$. Then $Z \sim \mathcal{N}(0,1)$.

Figures (8)

  • Figure 1: Left: Contrary to "classic" residual blocks, our ReSIREN residual connection is placed before the non-linearity rather than after. Right: Scaling comparison of SIREN and ReSIREN on the biome classification task (Section \ref{['sec:dsts']}). ReSIREN underperforms at small depths but scales better. See Appendix \ref{['scaling']} for further details.
  • Figure 2: Experimental probability density function of adding up two independent $\text{Arcsin}(0,1)$. The sum is a ill-behaved distribution that can not be trivially normalized back to $\text{Arcsin}(0,1)$. Therefore, we choose to apply residual connections between to the well-behaved Gaussians in the middle of each SIREN layer.
  • Figure 3: Analysis of the reconstruction error of CHELSA after the Climplicit pretraining. Top: Absolute error distribution for each climatic variable. Middle: Absolute error distribution for each month. Bottom: Mean absolute error inside 136 by 320 grid cells across the globe.
  • Figure 4: Comparison of the scaling behavior of ReSIREN compared to SIREN. Generally, ReSIREN performs worse at few layers, but outperforms and scales better with more layers in exchange for an increase in training time.
  • Figure 5: Comparison of the fully-trained and pretrained embeddings by mapping the predicted biomes in the region around Lake Victoria, covering parts of Kenya, Tanzania, Uganda, Rwanda, Burundi and the Democratic Republic Kongo. Climplicit achieves a good reconstruction without any hallucinations, albeit missing high-level details such as the mangroves along the coast.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Lemma B.1
  • proof