FIN: Fast Inference Network for Map Segmentation
Ruan Bispo, Tim Brophy, Reenu Mohandas, Anthony Scanlan, Ciarán Eising
TL;DR
This work tackles real-time map segmentation for autonomous driving by introducing FIN, a camera–radar fusion network that operates in BEV space. FIN combines a ResNet-50 image backbone, a PAN radar backbone, a radar‑assisted BEV projection (RVT), cross-modal MDCA fusion, and a lightweight U‑Net–based head, trained with a six-term loss set to balance accuracy and boundary precision. It achieves a mean IoU of 53.5 on nuScenes while running at ~26 FPS on an NVIDIA A100, representing a 260% speedup over strong baselines and demonstrating robust performance across challenging weather and lighting conditions. The results indicate FIN can deliver high-fidelity, real-time map segmentation with balanced per-class results, supporting safer planning and trajectory prediction in dynamic driving environments, while also highlighting remaining challenges in occluded and distant regions for future work.
Abstract
Multi-sensor fusion in autonomous vehicles is becoming more common to offer a more robust alternative for several perception tasks. This need arises from the unique contribution of each sensor in collecting data: camera-radar fusion offers a cost-effective solution by combining rich semantic information from cameras with accurate distance measurements from radar, without incurring excessive financial costs or overwhelming data processing requirements. Map segmentation is a critical task for enabling effective vehicle behaviour in its environment, yet it continues to face significant challenges in achieving high accuracy and meeting real-time performance requirements. Therefore, this work presents a novel and efficient map segmentation architecture, using cameras and radars, in the \acrfull{bev} space. Our model introduces a real-time map segmentation architecture considering aspects such as high accuracy, per-class balancing, and inference time. To accomplish this, we use an advanced loss set together with a new lightweight head to improve the perception results. Our results show that, with these modifications, our approach achieves results comparable to large models, reaching 53.5 mIoU, while also setting a new benchmark for inference time, improving it by 260\% over the strongest baseline models.
