MC-BEVRO: Multi-Camera Bird Eye View Road Occupancy Detection for Traffic Monitoring
Arpitsinh Vaghela, Duo Lu, Aayush Atul Verma, Bharatesh Chakravarthi, Hua Wei, Yezhou Yang
TL;DR
This work tackles occlusion and limited field of view in roadside traffic perception by proposing a BEV occupancy framework that fuses data from multiple cameras. It compares a late fusion baseline with three early fusion methods and enhances generalization through static background integration, using a synthetic CARLA dataset and rigorous ablations on occupancy map resolution. The approach demonstrates strong improvements over baselines, reveals the value of multi-camera inputs, and shows promising sim-to-real transfer via zero-shot and few-shot fine-tuning on real-world data. The contributions include a scalable dataset, multiple fusion strategies, and practical insights for deploying BEV occupancy in traffic monitoring and management.
Abstract
Single camera 3D perception for traffic monitoring faces significant challenges due to occlusion and limited field of view. Moreover, fusing information from multiple cameras at the image feature level is difficult because of different view angles. Further, the necessity for practical implementation and compatibility with existing traffic infrastructure compounds these challenges. To address these issues, this paper introduces a novel Bird's-Eye-View road occupancy detection framework that leverages multiple roadside cameras to overcome the aforementioned limitations. To facilitate the framework's development and evaluation, a synthetic dataset featuring diverse scenes and varying camera configurations is generated using the CARLA simulator. A late fusion and three early fusion methods were implemented within the proposed framework, with performance further enhanced by integrating backgrounds. Extensive evaluations were conducted to analyze the impact of multi-camera inputs and varying BEV occupancy map sizes on model performance. Additionally, a real-world data collection pipeline was developed to assess the model's ability to generalize to real-world environments. The sim-to-real capabilities of the model were evaluated using zero-shot and few-shot fine-tuning, demonstrating its potential for practical application. This research aims to advance perception systems in traffic monitoring, contributing to improved traffic management, operational efficiency, and road safety.
