A Benchmark for Cycling Close Pass Detection from Video Streams
Mingjie Li, Ben Beck, Tharindu Rathnayake, Lingheng Meng, Zijue Chen, Akansel Cosgun, Xiaojun Chang, Dana Kulić
TL;DR
This work introduces Cyc-CP, a benchmark for detecting cycling close passes from video streams, and defines two CP detection tasks: scene-level (clip-level presence) and instance-level (which vehicle causes the CP). It combines a synthetic CARLA dataset with real-world VOC data and evaluates four benchmark models, including traditional video architectures (I3D, CNN+LSTM) and a monocular 3D detector-based framework (ICD), with additional exploration of a large multimodal model (InternVideo 2.5) via prompts. On the real-world VOC data, scene-level and instance-level detections achieve $88.13\%$ and $84.60\%$ accuracy, respectively, while experiments show that RGB-only inputs generally outperform optical-flow-enhanced configurations and that alternating or finetuning strategies improve instance-level performance. The benchmark is released openly to accelerate CP detection research and inform road safety policy, with future work aiming to extend beyond CP events, incorporate additional sensors, and enhance data diversity and robustness.
Abstract
Cycling is a healthy and sustainable mode of transport. However, interactions with motor vehicles remain a key barrier to increased cycling participation. The ability to detect potentially dangerous interactions from on-bike sensing could provide important information to riders and policymakers. A key influence on rider comfort and safety is close passes, i.e., when a vehicle narrowly passes a cyclist. In this paper, we introduce a novel benchmark, called Cyc-CP, towards close pass (CP) event detection from video streams. The task is formulated into two problem categories: scene-level and instance-level. Scene-level detection ascertains the presence of a CP event within the provided video clip. Instance-level detection identifies the specific vehicle within the scene that precipitates a CP event. To address these challenges, we introduce four benchmark models, each underpinned by advanced deep-learning methodologies. For training and evaluating those models, we have developed a synthetic dataset alongside the acquisition of a real-world dataset. The benchmark evaluations reveal that the models achieve an accuracy of 88.13\% for scene-level detection and 84.60\% for instance-level detection on the real-world dataset. We envision this benchmark as a test-bed to accelerate CP detection and facilitate interaction between the fields of road safety, intelligent transportation systems and artificial intelligence. Both the benchmark datasets and detection models will be available at https://github.com/SustainableMobility/cyc-cp to facilitate experimental reproducibility and encourage more in-depth research in the field.
