Table of Contents
Fetching ...

Mapping Socio-Economic Divides with Urban Mobility Data

Yingche Liu, Mengyang Li

TL;DR

The paper addresses how megacity bike-sharing data can map urban socio-economic divides by bridging the data gap with an LLM-based enrichment of housing-price proxies. It deploys an interpretable Random Forest framework to decompose drivers of mobility and reports a strong link between local wealth and bike usage, quantified by $R^2=0.350$ on a held-out set and $MAPE=8.1\%$. Key findings include the club effect (spatial clustering of mobility in affluent areas), a functional dichotomy (utilitarian vs recreational usage), and an inverted U-shaped adoption curve with the urban middle class at the core. These results have practical implications for transportation equity and urban planning, offering a scraper-free, reproducible approach to diagnosing inequality using mobility data and informing targeted infrastructure and policy decisions. The study also outlines avenues for extending the framework to other cities and temporal dynamics.

Abstract

The massive digital footprints generated by bike-sharing systems in megacities like Shanghai offer a novel perspective on the urban socio-economic fabric. This study investigates whether these daily mobility patterns can quantitatively map the city's underlying social stratification. To overcome the persistent challenge of acquiring fine-grained socio-economic data, we constructed a multi-layered analytical dataset. We annotated 2,000 raw bike trips with local economic attributes, derived from a novel data enrichment methodology that employs a Large Language Model (LLM), and integrated contextual features of the built environment. A Random Forest model was then utilized as an interpretable framework to determine the key factors governing the relationship between mobility behavior and local economic status. The analysis reveals a compelling and unambiguous finding: a neighborhood's economic level, proxied by housing prices, is the single most dominant predictor of its bike-sharing patterns, substantially outweighing other geographic or temporal factors. This economic determinism manifests in three distinct ways: (1) a spatial clustering of resources, a phenomenon we term the \textit{club effect}, which concentrates mobility infrastructure and usage in affluent areas; (2) a functional dichotomy between necessity-driven, utilitarian usage in lower-income zones and flexible, recreational usage in wealthier ones; and (3) a nuanced inverted U-shaped adoption curve that identifies the urban middle class as the system's primary user base.

Mapping Socio-Economic Divides with Urban Mobility Data

TL;DR

The paper addresses how megacity bike-sharing data can map urban socio-economic divides by bridging the data gap with an LLM-based enrichment of housing-price proxies. It deploys an interpretable Random Forest framework to decompose drivers of mobility and reports a strong link between local wealth and bike usage, quantified by on a held-out set and . Key findings include the club effect (spatial clustering of mobility in affluent areas), a functional dichotomy (utilitarian vs recreational usage), and an inverted U-shaped adoption curve with the urban middle class at the core. These results have practical implications for transportation equity and urban planning, offering a scraper-free, reproducible approach to diagnosing inequality using mobility data and informing targeted infrastructure and policy decisions. The study also outlines avenues for extending the framework to other cities and temporal dynamics.

Abstract

The massive digital footprints generated by bike-sharing systems in megacities like Shanghai offer a novel perspective on the urban socio-economic fabric. This study investigates whether these daily mobility patterns can quantitatively map the city's underlying social stratification. To overcome the persistent challenge of acquiring fine-grained socio-economic data, we constructed a multi-layered analytical dataset. We annotated 2,000 raw bike trips with local economic attributes, derived from a novel data enrichment methodology that employs a Large Language Model (LLM), and integrated contextual features of the built environment. A Random Forest model was then utilized as an interpretable framework to determine the key factors governing the relationship between mobility behavior and local economic status. The analysis reveals a compelling and unambiguous finding: a neighborhood's economic level, proxied by housing prices, is the single most dominant predictor of its bike-sharing patterns, substantially outweighing other geographic or temporal factors. This economic determinism manifests in three distinct ways: (1) a spatial clustering of resources, a phenomenon we term the \textit{club effect}, which concentrates mobility infrastructure and usage in affluent areas; (2) a functional dichotomy between necessity-driven, utilitarian usage in lower-income zones and flexible, recreational usage in wealthier ones; and (3) a nuanced inverted U-shaped adoption curve that identifies the urban middle class as the system's primary user base.

Paper Structure

This paper contains 19 sections, 10 figures, 2 tables.

Figures (10)

  • Figure 1: The distribution of estimated house prices across all trip start locations. The right-skewed pattern is characteristic of urban economies and confirms the economic heterogeneity within our dataset, a prerequisite for studying socio-economic factors.
  • Figure 2: The spatial distribution of the 1,880 bike trip start locations in the final dataset. Each point is colored according to the LLM-estimated house price, illustrating the diverse economic coverage of the study area.
  • Figure 3: Bike activity levels across the gridded study area. Each bubble represents a grid cell, with its size and color corresponding to the number of trips originating within it (grid_trip_count). The map reveals a distinct spatial clustering of bike-sharing "hotspots," defining the city's primary mobility hubs.
  • Figure 4: Performance comparison of five machine learning models on the house price prediction task, evaluated on the test set. Random Forest achieved the highest R² score, indicating its superior ability to capture the complex, non-linear patterns in the data.
  • Figure 5: Scatter plot of Predicted vs. Actual house prices on the test set. The concentration of points along the diagonal line, along with the absence of systematic bias, demonstrates the Random Forest model's strong predictive accuracy and reliability.
  • ...and 5 more figures