High and Low Resolution Tradeoffs in Roadside Multimodal Sensing
Shaozu Ding, Yihong Tang, Marco De Vincenzi, Dajiang Suo
TL;DR
The paper tackles the cost–performance tradeoff in roadside sensing by coupling an integer-programming-based sensor placement framework with a biologically inspired multimodal fusion pipeline. It demonstrates that a configuration of two low-resolution LiDARs and one velocity-rich 4D radar can achieve comparable or superior perception at lower cost compared to a single high-resolution LiDAR, with notable improvements in pedestrian detection and overall mAP across multiple architectures. The proposed ex-ante evaluation framework and open benchmark enable fair, data-driven comparisons of multimodal deployments and fusion strategies, promoting scalable and economical infrastructure-based perception for road safety. These findings highlight that information richness from velocity cues can compensate for reduced spatial resolution, challenging the assumption that higher resolution is always better for roadside sensing.
Abstract
Balancing cost and performance is crucial when choosing high- versus low-resolution point-cloud roadside sensors. For example, LiDAR delivers dense point cloud, while 4D millimeter-wave radar, though spatially sparser, embeds velocity cues that help distinguish objects and come at a lower price. Unfortunately, the sensor placement strategies will influence point cloud density and distribution across the coverage area. Compounding the first challenge is the fact that different sensor mixtures often demand distinct neural network architectures to maximize their complementary strengths. Without an evaluation framework that establishes a benchmark for comparison, it is imprudent to make claims regarding whether marginal gains result from higher resolution and new sensing modalities or from the algorithms. We present an ex-ante evaluation that addresses the two challenges. First, we realized a simulation tool that builds on integer programming to automatically compare different sensor placement strategies against coverage and cost jointly. Additionally, inspired by human multi-sensory integration, we propose a modular framework to assess whether reductions in spatial resolution can be compensated by informational richness in detecting traffic participants. Extensive experimental testing on the proposed framework shows that fusing velocity-encoded radar with low-resolution LiDAR yields marked gains (14 percent AP for pedestrians and an overall mAP improvement of 1.5 percent across six categories) at lower cost than high-resolution LiDAR alone. Notably, these marked gains hold regardless of the specific deep neural modules employed in our frame. The result challenges the prevailing assumption that high resolution are always superior to low-resolution alternatives.
