Table of Contents
Fetching ...

Evaluating the effects of Data Sparsity on the Link-level Bicycling Volume Estimation: A Graph Convolutional Neural Network Approach

Mohit Gupta, Debjit Bhowmick, Meead Saberi, Shirui Pan, Ben Beck

TL;DR

The paper tackles the challenge of estimating link-level bicycling volumes in a sparsely observed urban network by introducing a node-centric Graph Convolutional Network that fuses OpenStreetMap bike infrastructure with Strava Metro counts for the City of Melbourne. It systematically simulates data sparsity from 0% to 99% and benchmarks the GCN against traditional models (LR, SVM, RF), showing strong performance of the GCN at low to moderate sparsity and a sharp decline at extreme sparsity. The study demonstrates the value of graph-structured modeling for capturing spatial dependencies in bicycle networks and provides actionable insights for planners, while acknowledging Strava data biases and the need for robustness enhancements. Future work proposes hybrid approaches and additional data sources to improve resilience to sparsity and to enable real-time, cross-city applicability.

Abstract

Accurate bicycling volume estimation is crucial for making informed decisions and planning about future investments in bicycling infrastructure. However, traditional link-level volume estimation models are effective for motorized traffic but face significant challenges when applied to the bicycling context because of sparse data and the intricate nature of bicycling mobility patterns. To the best of our knowledge, we present the first study to utilize a Graph Convolutional Network (GCN) architecture to model link-level bicycling volumes and systematically investigate the impact of varying levels of data sparsity (0%--99%) on model performance, simulating real-world scenarios. We have leveraged Strava Metro data as the primary source of bicycling counts across 15,933 road segments/links in the City of Melbourne, Australia. To evaluate the effectiveness of the GCN model, we benchmark it against traditional machine learning models, such as linear regression, support vector machines, and random forest. Our results show that the GCN model outperforms these traditional models in predicting Annual Average Daily Bicycle (AADB) counts, demonstrating its ability to capture the spatial dependencies inherent in bicycle traffic networks. While GCN remains robust up to 80% sparsity, its performance declines sharply beyond this threshold, highlighting the challenges of extreme data sparsity. These findings underscore the potential of GCNs in enhancing bicycling volume estimation, while also emphasizing the need for further research on methods to improve model resilience under high-sparsity conditions. Our findings offer valuable insights for city planners aiming to improve bicycling infrastructure and promote sustainable transportation.

Evaluating the effects of Data Sparsity on the Link-level Bicycling Volume Estimation: A Graph Convolutional Neural Network Approach

TL;DR

The paper tackles the challenge of estimating link-level bicycling volumes in a sparsely observed urban network by introducing a node-centric Graph Convolutional Network that fuses OpenStreetMap bike infrastructure with Strava Metro counts for the City of Melbourne. It systematically simulates data sparsity from 0% to 99% and benchmarks the GCN against traditional models (LR, SVM, RF), showing strong performance of the GCN at low to moderate sparsity and a sharp decline at extreme sparsity. The study demonstrates the value of graph-structured modeling for capturing spatial dependencies in bicycle networks and provides actionable insights for planners, while acknowledging Strava data biases and the need for robustness enhancements. Future work proposes hybrid approaches and additional data sources to improve resilience to sparsity and to enable real-time, cross-city applicability.

Abstract

Accurate bicycling volume estimation is crucial for making informed decisions and planning about future investments in bicycling infrastructure. However, traditional link-level volume estimation models are effective for motorized traffic but face significant challenges when applied to the bicycling context because of sparse data and the intricate nature of bicycling mobility patterns. To the best of our knowledge, we present the first study to utilize a Graph Convolutional Network (GCN) architecture to model link-level bicycling volumes and systematically investigate the impact of varying levels of data sparsity (0%--99%) on model performance, simulating real-world scenarios. We have leveraged Strava Metro data as the primary source of bicycling counts across 15,933 road segments/links in the City of Melbourne, Australia. To evaluate the effectiveness of the GCN model, we benchmark it against traditional machine learning models, such as linear regression, support vector machines, and random forest. Our results show that the GCN model outperforms these traditional models in predicting Annual Average Daily Bicycle (AADB) counts, demonstrating its ability to capture the spatial dependencies inherent in bicycle traffic networks. While GCN remains robust up to 80% sparsity, its performance declines sharply beyond this threshold, highlighting the challenges of extreme data sparsity. These findings underscore the potential of GCNs in enhancing bicycling volume estimation, while also emphasizing the need for further research on methods to improve model resilience under high-sparsity conditions. Our findings offer valuable insights for city planners aiming to improve bicycling infrastructure and promote sustainable transportation.

Paper Structure

This paper contains 31 sections, 8 equations, 24 figures, 5 tables.

Figures (24)

  • Figure 1: Comparison of Strava AADB Count Distribution Before and After Box-Cox Transformation
  • Figure 2: Overview of methodology for link-level bicycling volume prediction using using a Graph Convolutional Network (GCN) approach. Data section shows integration of Strava Metro Data (crowd-sourced bicycle volume data) and OpenStreetMap (OSM) Data (road infrastructure data), linked via unique OSM IDs. The graph construction phase includes: (a) mapping the bicycle network within the City of Melbourne, (b) demonstrating the process of graph inversion on a small area, where road segments are converted into nodes and intersections into edges, and (c) forming the final node-centric graph representation. The model architecture section is representing the GCN Configuration G (discussed in \ref{['sec:gcn_config']} and \ref{['sec:gcn_config_result']})
  • Figure 3: Data split and sparsity simulation for the training process. The constructed graph consists of 15,933 nodes (road segments/links), divided into training (80% - 12,746 nodes), validation (5% - 797 nodes), and testing (15% - 2,390 nodes) sets. To simulate varying sparsity levels, the training data is progressively reduced from 0% sparsity (12,746 nodes) to 99% sparsity (127 nodes), enabling a systematic evaluation of model performance under different levels of data sparsity.
  • Figure 4: Bicycle Network in City of Melbourne
  • Figure 5: Sparsity - 0%
  • ...and 19 more figures