Enhancing Ride-Hailing Forecasting at DiDi with Multi-View Geospatial Representation Learning from the Web
Xixuan Hao, Guicheng Li, Daiqiang Wu, Xusen Guo, Yumeng Zhu, Zhichao Zou, Peng Zhen, Yao Yao, Yuxuan Liang
TL;DR
This paper tackles the challenge of forecasting ride-hailing demand and supply under geospatial heterogeneity and external shocks. It introduces MVGR-Net, a two-stage framework that first learns region representations from semantic POI attributes and temporal mobility patterns, then conducts forecast generation through a prompt-empowered, LLM-based pipeline that integrates exogenous factors and textual prompts via LoRA fine-tuning. The approach delivers state-of-the-art performance on DiDi real-world data, with consistent improvements in Call and TSH and strong qualitative and deployment results, including an embedding vector library and online A/B tests. The work demonstrates practical impact by enabling more accurate demand-supply forecasting, smarter subsidy allocation, and scalable integration of geospatial priors into production systems.
Abstract
The proliferation of ride-hailing services has fundamentally transformed urban mobility patterns, making accurate ride-hailing forecasting crucial for optimizing passenger experience and urban transportation efficiency. However, ride-hailing forecasting faces significant challenges due to geospatial heterogeneity and high susceptibility to external events. This paper proposes MVGR-Net(Multi-View Geospatial Representation Learning), a novel framework that addresses these challenges through a two-stage approach. In the pretraining stage, we learn comprehensive geospatial representations by integrating Points-of-Interest and temporal mobility patterns to capture regional characteristics from both semantic attribute and temporal mobility pattern views. The forecasting stage leverages these representations through a prompt-empowered framework that fine-tunes Large Language Models while incorporating external events. Extensive experiments on DiDi's real-world datasets demonstrate the state-of-the-art performance.
