From a Bird's Eye View to See: Joint Camera and Subject Registration without the Camera Calibration
Zekun Qian, Ruize Han, Wei Feng, Feifan Wang, Song Wang
TL;DR
This work tackles the challenging problem of joint camera and subject registration in BEV without explicit camera calibration. It introduces an end-to-end framework that alternates between BEV-based subject localization (via VTM) and geometric camera pose estimation (via SAM), followed by a geometry- and appearance-driven registration/fusion process, augmented by self-supervised subject association. The authors produce a large synthetic CSRD dataset and demonstrate strong cross-view and cross-domain performance, including real-world evaluation, with ablations confirming the contributions of pretrained components and orientation supervision. The approach eliminates the need for BEV inputs or calibration data in many practical scenarios, enabling robust multi-view human localization and camera localization in a unified BEV.
Abstract
We tackle a new problem of multi-view camera and subject registration in the bird's eye view (BEV) without pre-given camera calibration. This is a very challenging problem since its only input is several RGB images from different first-person views (FPVs) for a multi-person scene, without the BEV image and the calibration of the FPVs, while the output is a unified plane with the localization and orientation of both the subjects and cameras in a BEV. We propose an end-to-end framework solving this problem, whose main idea can be divided into following parts: i) creating a view-transform subject detection module to transform the FPV to a virtual BEV including localization and orientation of each pedestrian, ii) deriving a geometric transformation based method to estimate camera localization and view direction, i.e., the camera registration in a unified BEV, iii) making use of spatial and appearance information to aggregate the subjects into the unified BEV. We collect a new large-scale synthetic dataset with rich annotations for evaluation. The experimental results show the remarkable effectiveness of our proposed method.
