Evaluating the effects of Data Sparsity on the Link-level Bicycling Volume Estimation: A Graph Convolutional Neural Network Approach
Mohit Gupta, Debjit Bhowmick, Meead Saberi, Shirui Pan, Ben Beck
TL;DR
The paper tackles the challenge of estimating link-level bicycling volumes in a sparsely observed urban network by introducing a node-centric Graph Convolutional Network that fuses OpenStreetMap bike infrastructure with Strava Metro counts for the City of Melbourne. It systematically simulates data sparsity from 0% to 99% and benchmarks the GCN against traditional models (LR, SVM, RF), showing strong performance of the GCN at low to moderate sparsity and a sharp decline at extreme sparsity. The study demonstrates the value of graph-structured modeling for capturing spatial dependencies in bicycle networks and provides actionable insights for planners, while acknowledging Strava data biases and the need for robustness enhancements. Future work proposes hybrid approaches and additional data sources to improve resilience to sparsity and to enable real-time, cross-city applicability.
Abstract
Accurate bicycling volume estimation is crucial for making informed decisions and planning about future investments in bicycling infrastructure. However, traditional link-level volume estimation models are effective for motorized traffic but face significant challenges when applied to the bicycling context because of sparse data and the intricate nature of bicycling mobility patterns. To the best of our knowledge, we present the first study to utilize a Graph Convolutional Network (GCN) architecture to model link-level bicycling volumes and systematically investigate the impact of varying levels of data sparsity (0%--99%) on model performance, simulating real-world scenarios. We have leveraged Strava Metro data as the primary source of bicycling counts across 15,933 road segments/links in the City of Melbourne, Australia. To evaluate the effectiveness of the GCN model, we benchmark it against traditional machine learning models, such as linear regression, support vector machines, and random forest. Our results show that the GCN model outperforms these traditional models in predicting Annual Average Daily Bicycle (AADB) counts, demonstrating its ability to capture the spatial dependencies inherent in bicycle traffic networks. While GCN remains robust up to 80% sparsity, its performance declines sharply beyond this threshold, highlighting the challenges of extreme data sparsity. These findings underscore the potential of GCNs in enhancing bicycling volume estimation, while also emphasizing the need for further research on methods to improve model resilience under high-sparsity conditions. Our findings offer valuable insights for city planners aiming to improve bicycling infrastructure and promote sustainable transportation.
