Mapping Socio-Economic Divides with Urban Mobility Data
Yingche Liu, Mengyang Li
TL;DR
The paper addresses how megacity bike-sharing data can map urban socio-economic divides by bridging the data gap with an LLM-based enrichment of housing-price proxies. It deploys an interpretable Random Forest framework to decompose drivers of mobility and reports a strong link between local wealth and bike usage, quantified by $R^2=0.350$ on a held-out set and $MAPE=8.1\%$. Key findings include the club effect (spatial clustering of mobility in affluent areas), a functional dichotomy (utilitarian vs recreational usage), and an inverted U-shaped adoption curve with the urban middle class at the core. These results have practical implications for transportation equity and urban planning, offering a scraper-free, reproducible approach to diagnosing inequality using mobility data and informing targeted infrastructure and policy decisions. The study also outlines avenues for extending the framework to other cities and temporal dynamics.
Abstract
The massive digital footprints generated by bike-sharing systems in megacities like Shanghai offer a novel perspective on the urban socio-economic fabric. This study investigates whether these daily mobility patterns can quantitatively map the city's underlying social stratification. To overcome the persistent challenge of acquiring fine-grained socio-economic data, we constructed a multi-layered analytical dataset. We annotated 2,000 raw bike trips with local economic attributes, derived from a novel data enrichment methodology that employs a Large Language Model (LLM), and integrated contextual features of the built environment. A Random Forest model was then utilized as an interpretable framework to determine the key factors governing the relationship between mobility behavior and local economic status. The analysis reveals a compelling and unambiguous finding: a neighborhood's economic level, proxied by housing prices, is the single most dominant predictor of its bike-sharing patterns, substantially outweighing other geographic or temporal factors. This economic determinism manifests in three distinct ways: (1) a spatial clustering of resources, a phenomenon we term the \textit{club effect}, which concentrates mobility infrastructure and usage in affluent areas; (2) a functional dichotomy between necessity-driven, utilitarian usage in lower-income zones and flexible, recreational usage in wealthier ones; and (3) a nuanced inverted U-shaped adoption curve that identifies the urban middle class as the system's primary user base.
