Table of Contents
Fetching ...

Classifying Bicycle Infrastructure Using On-Bike Street-Level Images

Kal Backman, Ben Beck, Dana Kulić

TL;DR

The paper addresses the challenge of mapping city-wide cycling infrastructure using on-board street-level imagery. It introduces a hierarchical infrastructure classifier that processes image sequences with a ConvNeXt-V2 backbone, a latent encoder, a temporal self-attention module, and a decoder to output main and sub-class labels. The approach is trained on a large crowd-sourced Melbourne dataset with GPS-OSM labeling and demonstrates high accuracy (main class ~96%, sub-class ~95%) and robustness to extreme feature sparsity. It discusses labeling noise and limitations, and outlines avenues for extending to dynamic, crowd-sourced infrastructure maps to guide safer cycling networks.

Abstract

While cycling offers an attractive option for sustainable transportation, many potential cyclists are discouraged from taking up cycling due to the lack of suitable and safe infrastructure. Efficiently mapping cycling infrastructure across entire cities is necessary to advance our understanding of how to provide connected networks of high-quality infrastructure. Therefore we propose a system capable of classifying available cycling infrastructure from on-bike smartphone camera data. The system receives an image sequence as input, temporally analyzing the sequence to account for sparsity of signage. The model outputs cycling infrastructure class labels defined by a hierarchical classification system. Data is collected via participant cyclists covering 7,006Km across the Greater Melbourne region that is automatically labeled via a GPS and OpenStreetMap database matching algorithm. The proposed model achieved an accuracy of 95.38%, an increase in performance of 7.55% compared to the non-temporal model. The model demonstrated robustness to extreme absence of image features where the model lost only 6.6% in accuracy after 90% of images being replaced with blank images. This work is the first to classify cycling infrastructure using only street-level imagery collected from bike-mounted mobile phone cameras, while demonstrating robustness to feature sparsity via long temporal sequence analysis.

Classifying Bicycle Infrastructure Using On-Bike Street-Level Images

TL;DR

The paper addresses the challenge of mapping city-wide cycling infrastructure using on-board street-level imagery. It introduces a hierarchical infrastructure classifier that processes image sequences with a ConvNeXt-V2 backbone, a latent encoder, a temporal self-attention module, and a decoder to output main and sub-class labels. The approach is trained on a large crowd-sourced Melbourne dataset with GPS-OSM labeling and demonstrates high accuracy (main class ~96%, sub-class ~95%) and robustness to extreme feature sparsity. It discusses labeling noise and limitations, and outlines avenues for extending to dynamic, crowd-sourced infrastructure maps to guide safer cycling networks.

Abstract

While cycling offers an attractive option for sustainable transportation, many potential cyclists are discouraged from taking up cycling due to the lack of suitable and safe infrastructure. Efficiently mapping cycling infrastructure across entire cities is necessary to advance our understanding of how to provide connected networks of high-quality infrastructure. Therefore we propose a system capable of classifying available cycling infrastructure from on-bike smartphone camera data. The system receives an image sequence as input, temporally analyzing the sequence to account for sparsity of signage. The model outputs cycling infrastructure class labels defined by a hierarchical classification system. Data is collected via participant cyclists covering 7,006Km across the Greater Melbourne region that is automatically labeled via a GPS and OpenStreetMap database matching algorithm. The proposed model achieved an accuracy of 95.38%, an increase in performance of 7.55% compared to the non-temporal model. The model demonstrated robustness to extreme absence of image features where the model lost only 6.6% in accuracy after 90% of images being replaced with blank images. This work is the first to classify cycling infrastructure using only street-level imagery collected from bike-mounted mobile phone cameras, while demonstrating robustness to feature sparsity via long temporal sequence analysis.

Paper Structure

This paper contains 19 sections, 5 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Overview of the infrastructure classification model. (Left) Estimating cycling infrastructure from a single image is difficult due to the temporal sparsity and occlusion of signage. (Right) Summary of the model’s architecture which consists of an image sequence input from which image features are extracted and compressed onto a latent vector.
  • Figure 2: Example images of the different cycling infrastructure classes and their associated groupings into main and sub classes.
  • Figure 3: Example of smartphone setup attached to the front handle bars to record onboard street view imagery.
  • Figure 4: Overview of the GPS to OpenStreetMap road segment assignment algorithm. (Left) Example trajectory of cyclist. Each road segment is denoted with a coloured line indicating its respective infrastructure class. The start and end points of the road segment are indicated with a grey circle. (Middle) The to-be-classified GPS coordinate $P_i$ highlighted in yellow samples temporally nearby GPS-coordinates denoted in orange to construct $\ell_{GPS}$ which is used to filter out non-parallel lines. (Right) Non-parallel lines and spatially distant lines are filtered out while remaining lines have their perpendicular and colinear distances ($D_{perp}$ & $D_{colin}$) computed. For the example shown, $\ell_1$, $\ell_2$ & $\ell_3$ are colinear to each other, thus share identical perpendicular distances: $D_{perp\text{-}1} = D_{perp\text{-}2} = D_{perp\text{-}3}$. As the projection of $P_i$ lies on $\ell_2$, denoted as $P_{proj}$, the colinear distance equals zero: $D_{colin\text{-}2} = 0$. As $P_{proj}$ lies outside of $\ell_1$ & $\ell_3$ their colinear distances equal $D_{colin\text{-}1}$ & $D_{colin\text{-}3}$ respectively.
  • Figure 5: Normalized confusion matrix of the proposed model’s main classes for the validation dataset.
  • ...and 1 more figures