Table of Contents
Fetching ...

MoRA: Mobility as the Backbone for Geospatial Representation Learning at Scale

Ya Wen, Jixuan Cai, Qiyao Ma, Linyan Li, Xinhua Chen, Chris Webster, Yulun Zhou

TL;DR

MoRA addresses the challenge of learning generalizable geospatial representations by grounding location semantics in human mobility and functional inter-location relationships. It introduces a mobility-as-backbone framework that aligns POIs, imagery, and demographics to a billion-edge mobility graph using a CLIP-style objective and a LightGCN-based encoder over an H3 grid. The approach yields a compact $128$-dimension embedding that achieves an average $R^2$ improvement of $12.9\%$ across nine downstream socio-economic tasks and demonstrates scaling laws in geospatial representation learning, with open-source code and a distilled inference model. The work advances practical, scalable, and privacy-conscious geospatial inference at national scales, with strong cross-task performance and robustness across spatial resolutions.

Abstract

Representation learning of geospatial locations remains a core challenge in achieving general geospatial intelligence, with increasingly diverging philosophies and techniques. While Earth observation paradigms excel at depicting locations in their physical states, we claim that a location's comprehensive "meaning" is better grounded in its internal human activity patterns and, crucially, its functional relationships with other locations, as revealed by human movement. We present MoRA, a human-centric geospatial framework that leverages a mobility graph as its core backbone to fuse various data modalities, aiming to learn embeddings that represent the socio-economic context and functional role of a location. MoRA achieves this through the integration of spatial tokenization, GNNs, and asymmetric contrastive learning to align 100M+ POIs, massive remote sensing imagery, and structured demographic statistics with a billion-edge mobility graph, ensuring the three auxiliary modalities are interpreted through the lens of fundamental human dynamics. To rigorously evaluate the effectiveness of MoRA, we construct a benchmark dataset composed of 9 downstream prediction tasks across social and economic domains. Experiments show that MoRA, with four input modalities and a compact 128-dimensional representation space, achieves superior predictive performances than state-of-the-art models by an average of 12.9%. Echoing LLM scaling laws, we further demonstrate the scaling behavior in geospatial representation learning. We open-source code and pretrained models at: https://github.com/ylzhouchris/MoRA.

MoRA: Mobility as the Backbone for Geospatial Representation Learning at Scale

TL;DR

MoRA addresses the challenge of learning generalizable geospatial representations by grounding location semantics in human mobility and functional inter-location relationships. It introduces a mobility-as-backbone framework that aligns POIs, imagery, and demographics to a billion-edge mobility graph using a CLIP-style objective and a LightGCN-based encoder over an H3 grid. The approach yields a compact -dimension embedding that achieves an average improvement of across nine downstream socio-economic tasks and demonstrates scaling laws in geospatial representation learning, with open-source code and a distilled inference model. The work advances practical, scalable, and privacy-conscious geospatial inference at national scales, with strong cross-task performance and robustness across spatial resolutions.

Abstract

Representation learning of geospatial locations remains a core challenge in achieving general geospatial intelligence, with increasingly diverging philosophies and techniques. While Earth observation paradigms excel at depicting locations in their physical states, we claim that a location's comprehensive "meaning" is better grounded in its internal human activity patterns and, crucially, its functional relationships with other locations, as revealed by human movement. We present MoRA, a human-centric geospatial framework that leverages a mobility graph as its core backbone to fuse various data modalities, aiming to learn embeddings that represent the socio-economic context and functional role of a location. MoRA achieves this through the integration of spatial tokenization, GNNs, and asymmetric contrastive learning to align 100M+ POIs, massive remote sensing imagery, and structured demographic statistics with a billion-edge mobility graph, ensuring the three auxiliary modalities are interpreted through the lens of fundamental human dynamics. To rigorously evaluate the effectiveness of MoRA, we construct a benchmark dataset composed of 9 downstream prediction tasks across social and economic domains. Experiments show that MoRA, with four input modalities and a compact 128-dimensional representation space, achieves superior predictive performances than state-of-the-art models by an average of 12.9%. Echoing LLM scaling laws, we further demonstrate the scaling behavior in geospatial representation learning. We open-source code and pretrained models at: https://github.com/ylzhouchris/MoRA.

Paper Structure

This paper contains 41 sections, 5 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: The methodological framework of MoRA.
  • Figure 2: Scaling behavior of downstream task performance with pretraining data size and spatial coverage. Map on the right illustrates the spatial extent of Jiangsu Province, East China, and the entire China, respectively covering 3,904, 28,855, and 195,574 H3 cells used for the pretraining. Left figure illustrates average $R^2$ values for all tasks, social tasks, and economic tasks respectively, while the middle figure illustrates task-specific scaling behaviors. Detailed numbers in Appendix \ref{['ap:scaling']}.
  • Figure 3: Model comparison results for ablation studies and sensitivity analysis.
  • Figure 4: The geographical distribution of various downstream dataset samples.