TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes

Yan Xia; Yunxiang Lu; Rui Song; Oussema Dhaouadi; João F. Henriques; Daniel Cremers

TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes

Yan Xia, Yunxiang Lu, Rui Song, Oussema Dhaouadi, João F. Henriques, Daniel Cremers

TL;DR

TrafficLoc addresses localizing traffic surveillance cameras within a 3D reference map by learning a coarse-to-fine image-to-point-cloud registration with cross-modal attention guided by geometry. It introduces Geometry-guided Attention Loss (GAL), Inter-intra Contrastive Learning (ICL), and Dense Training Alignment (DTA) to strengthen 2D-3D correspondence under large viewpoint changes, enabling robust 6-DoF pose estimation via EPnP-RANSAC. The approach is validated on the newly proposed Carla Intersection dataset (75 intersections across 8 worlds) and generalizes to KITTI and Nuscenes, achieving state-of-the-art localization accuracy and improved cross-domain performance, including challenging unseen scenes. The work provides a practical, scalable framework for cooperative perception in city-scale camera networks, with the Carla Intersection dataset and supplementary materials facilitating further research.

Abstract

We tackle the problem of localizing traffic cameras within a 3D reference map and propose a novel image-to-point cloud registration (I2P) method, TrafficLoc, in a coarse-tofine matching fashion. To overcome the lack of large-scale real-world intersection datasets, we first introduce Carla Intersection, a new simulated dataset with 75 urban and rural intersections in Carla. We find that current I2P methods struggle with cross-modal matching under large viewpoint differences, especially at traffic intersections. TrafficLoc thus employs a novel Geometry-guided Attention Loss (GAL) to focus only on the corresponding geometric regions under different viewpoints during 2D-3D feature fusion. To address feature inconsistency in paired image patch-point groups, we further propose Inter-intra Contrastive Learning (ICL) to enhance separating 2D patch/3D group features within each intra-modality and introduce Dense Training Alignment (DTA) with soft-argmax for improving position regression. Extensive experiments show our TrafficLoc greatly improves the performance over the SOTA I2P methods (up to 86%) on Carla Intersection and generalizes well to real-world data. TrafficLoc also achieves new SOTA performance on KITTI and NuScenes datasets, demonstrating the superiority across both in-vehicle and traffic cameras. Our project page is publicly available at https://tum-luk.github.io/projects/trafficloc/.

TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes

TL;DR

Abstract

TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (12)