Data driven discovery of human mobility models
Hao Guo, Weiyu Zhang, Junjie Yang, Yuanqiao Hou, Lei Dong, Yu Liu
TL;DR
This work tackles the lack of principled analytic mobility models by applying symbolic regression to multi-country mobility data, distilling interpretable expressions directly from observations. By modeling allocation weights and balancing predictive accuracy with expression complexity, the approach recovers the classic exponential-decay gravity model and reveals novel forms such as exponential-power-law distance decay, with interpretations grounded in maximum entropy. The study demonstrates geographic heterogeneity, robustness to noise, and a clear path to extending mobility models beyond traditional forms, providing a systematic framework for discovering mathematical structures in social phenomena from data. The combination of data-driven discovery and entropy-based interpretation offers a powerful tool for understanding and predicting human mobility at multiple scales.
Abstract
Human mobility is a fundamental aspect of social behavior, with broad applications in transportation, urban planning, and epidemic modeling. However, for decades new mathematical formulas to model mobility phenomena have been scarce and usually discovered by analogy to physical processes, such as the gravity model and the radiation model. These sporadic discoveries are often thought to rely on intuition and luck in fitting empirical data. Here, we propose a systematic approach that leverages symbolic regression to automatically discover interpretable models from human mobility data. Our approach finds several well-known formulas, such as the distance decay effect and classical gravity models, as well as previously unknown ones, such as an exponential-power-law decay that can be explained by the maximum entropy principle. By relaxing the constraints on the complexity of model expressions, we further show how key variables of human mobility are progressively incorporated into the model, making this framework a powerful tool for revealing the underlying mathematical structures of complex social phenomena directly from observational data.
