MoRA: Mobility as the Backbone for Geospatial Representation Learning at Scale
Ya Wen, Jixuan Cai, Qiyao Ma, Linyan Li, Xinhua Chen, Chris Webster, Yulun Zhou
TL;DR
MoRA addresses the challenge of learning generalizable geospatial representations by grounding location semantics in human mobility and functional inter-location relationships. It introduces a mobility-as-backbone framework that aligns POIs, imagery, and demographics to a billion-edge mobility graph using a CLIP-style objective and a LightGCN-based encoder over an H3 grid. The approach yields a compact $128$-dimension embedding that achieves an average $R^2$ improvement of $12.9\%$ across nine downstream socio-economic tasks and demonstrates scaling laws in geospatial representation learning, with open-source code and a distilled inference model. The work advances practical, scalable, and privacy-conscious geospatial inference at national scales, with strong cross-task performance and robustness across spatial resolutions.
Abstract
Representation learning of geospatial locations remains a core challenge in achieving general geospatial intelligence, with increasingly diverging philosophies and techniques. While Earth observation paradigms excel at depicting locations in their physical states, we claim that a location's comprehensive "meaning" is better grounded in its internal human activity patterns and, crucially, its functional relationships with other locations, as revealed by human movement. We present MoRA, a human-centric geospatial framework that leverages a mobility graph as its core backbone to fuse various data modalities, aiming to learn embeddings that represent the socio-economic context and functional role of a location. MoRA achieves this through the integration of spatial tokenization, GNNs, and asymmetric contrastive learning to align 100M+ POIs, massive remote sensing imagery, and structured demographic statistics with a billion-edge mobility graph, ensuring the three auxiliary modalities are interpreted through the lens of fundamental human dynamics. To rigorously evaluate the effectiveness of MoRA, we construct a benchmark dataset composed of 9 downstream prediction tasks across social and economic domains. Experiments show that MoRA, with four input modalities and a compact 128-dimensional representation space, achieves superior predictive performances than state-of-the-art models by an average of 12.9%. Echoing LLM scaling laws, we further demonstrate the scaling behavior in geospatial representation learning. We open-source code and pretrained models at: https://github.com/ylzhouchris/MoRA.
