StefaLand: An Efficient Geoscience Foundation Model That Improves Dynamic Land-Surface Predictions
Nicholas Kraabel, Jiangtao Liu, Yuchen Bian, Daniel Kifer, Chaopeng Shen
TL;DR
The paper tackles the challenge of spatial generalization in climate-driven land-surface prediction by introducing StefaLand, a statically grounded, attribute-based spatiotemporal foundation model. StefaLand employs a transformer-based masked autoencoder with cross-variable group masking to learn cross-domain interactions between static landscape attributes and dynamic forcings, followed by lightweight finetuning with residual adapters (StefaLand-resConn) for task-specific predictions. Across five datasets and four task classes—streamflow, soil moisture, soil composition, and landslide susceptibility—the model achieves state-of-the-art or competitive performance, substantially outperforming purely supervised baselines and alternative pretrained representations, while maintaining computational efficiency (pretraining around 720 GPU hours and ~12 million parameters). The results highlight the value of cross-domain representations and the effectiveness of attribute-centric pretraining to enable data-efficient generalization in data-scarce regions, with practical implications for hydrology and geohazards forecasting, though future work is needed to broaden targets, incorporate image-like data, and add uncertainty quantification.
Abstract
Managing natural resources and mitigating risks from floods, droughts, wildfires, and landslides require models that can accurately predict climate-driven land-surface responses. Traditional models often struggle with spatial generalization because they are trained or calibrated on limited observations and can degrade under concept drift. Recently proposed vision foundation models trained on satellite imagery demand massive compute, and they are not designed for dynamic land surface prediction tasks. We introduce StefaLand, a generative spatiotemporal Earth representation learning model centered on learning cross-domain interactions to suppress overfitting. StefaLand demonstrates especially strong spatial generalization on five datasets across four important tasks: streamflow, soil moisture, soil composition and landslides, compared to previous state-of-the-art methods. The domain-inspired design choices include a location-aware masked autoencoder that fuses static and time-series inputs, an attribute-based rather than image-based representation that drastically reduces compute demands, and residual fine-tuning adapters that strengthen knowledge transfer across tasks. StefaLand can be pretrained and finetuned on commonly available academic compute resources, yet consistently outperforms state-of-the-art supervised learning baselines, fine-tuned vision foundation models and commercially available embeddings, highlighting the previously overlooked value of cross-domain interactions and providing assistance to data-poor regions of the world.
