Table of Contents
Fetching ...

YOWO: You Only Walk Once to Jointly Map An Indoor Scene and Register Ceiling-mounted Cameras

Fan Yang, Sosuke Yamao, Ikuo Kusajima, Atsunori Moteki, Shoichi Masui, Shan Jiang

TL;DR

YOWO addresses the challenge of jointly mapping an indoor scene and registering ceiling-mounted cameras by leveraging a single walk of a mobile agent equipped with an ego RGB-D camera and synchronized observations from CMCs. It combines ego-camera SLAM-derived world trajectories with CMC-derived pseudo-scale trajectories and mobile keypoints, then aligns CMC poses to the world layout via a tailored spatiotemporal registration and a factor-graph-based joint optimization. The method demonstrates robust performance gains over separate baselines in both CMC pose registration and ego-scene mapping, aided by a novel CMC processing pipeline and a dedicated collaborative framework. The work provides a practical tool for downstream position-aware applications and delivers a public dataset for evaluating collaborative scene mapping and CMC registration.

Abstract

Using ceiling-mounted cameras (CMCs) for indoor visual capturing opens up a wide range of applications. However, registering CMCs to the target scene layout presents a challenging task. While manual registration with specialized tools is inefficient and costly, automatic registration with visual localization may yield poor results when visual ambiguity exists. To alleviate these issues, we propose a novel solution for jointly mapping an indoor scene and registering CMCs to the scene layout. Our approach involves equipping a mobile agent with a head-mounted RGB-D camera to traverse the entire scene once and synchronize CMCs to capture this mobile agent. The egocentric videos generate world-coordinate agent trajectories and the scene layout, while the videos of CMCs provide pseudo-scale agent trajectories and CMC relative poses. By correlating all the trajectories with their corresponding timestamps, the CMC relative poses can be aligned to the world-coordinate scene layout. Based on this initialization, a factor graph is customized to enable the joint optimization of ego-camera poses, scene layout, and CMC poses. We also develop a new dataset, setting the first benchmark for collaborative scene mapping and CMC registration (https://sites.google.com/view/yowo/home). Experimental results indicate that our method not only effectively accomplishes two tasks within a unified framework, but also jointly enhances their performance. We thus provide a reliable tool to facilitate downstream position-aware applications.

YOWO: You Only Walk Once to Jointly Map An Indoor Scene and Register Ceiling-mounted Cameras

TL;DR

YOWO addresses the challenge of jointly mapping an indoor scene and registering ceiling-mounted cameras by leveraging a single walk of a mobile agent equipped with an ego RGB-D camera and synchronized observations from CMCs. It combines ego-camera SLAM-derived world trajectories with CMC-derived pseudo-scale trajectories and mobile keypoints, then aligns CMC poses to the world layout via a tailored spatiotemporal registration and a factor-graph-based joint optimization. The method demonstrates robust performance gains over separate baselines in both CMC pose registration and ego-scene mapping, aided by a novel CMC processing pipeline and a dedicated collaborative framework. The work provides a practical tool for downstream position-aware applications and delivers a public dataset for evaluating collaborative scene mapping and CMC registration.

Abstract

Using ceiling-mounted cameras (CMCs) for indoor visual capturing opens up a wide range of applications. However, registering CMCs to the target scene layout presents a challenging task. While manual registration with specialized tools is inefficient and costly, automatic registration with visual localization may yield poor results when visual ambiguity exists. To alleviate these issues, we propose a novel solution for jointly mapping an indoor scene and registering CMCs to the scene layout. Our approach involves equipping a mobile agent with a head-mounted RGB-D camera to traverse the entire scene once and synchronize CMCs to capture this mobile agent. The egocentric videos generate world-coordinate agent trajectories and the scene layout, while the videos of CMCs provide pseudo-scale agent trajectories and CMC relative poses. By correlating all the trajectories with their corresponding timestamps, the CMC relative poses can be aligned to the world-coordinate scene layout. Based on this initialization, a factor graph is customized to enable the joint optimization of ego-camera poses, scene layout, and CMC poses. We also develop a new dataset, setting the first benchmark for collaborative scene mapping and CMC registration (https://sites.google.com/view/yowo/home). Experimental results indicate that our method not only effectively accomplishes two tasks within a unified framework, but also jointly enhances their performance. We thus provide a reliable tool to facilitate downstream position-aware applications.

Paper Structure

This paper contains 16 sections, 27 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Overview of scene mapping and CMC 6-DoF pose registration. While using mobile keypoints lee2022extrinsicpatzold2022online can estimate robust CMC relative 6-DoF poses (i), the application of SLAM & visual localization taira2018inlocsarlin2019coarse can generate CMC 6-DoF poses registered to the scene (ii). We bring their merits into our YOWO (iii).
  • Figure 2: Visual ambiguity. Top: the divergence between CMC and ego-camera views in a supermarket scene. Bottom: the similar captures between different CMCs.
  • Figure 3: Architecture of YOWO. YOWO mainly includes three processes: ego-camera processing (Sec. \ref{['sec:ego_pro']}), CMC processing (Sec. \ref{['sec:mul_pro']}), and collaborative processing (Sec. \ref{['sec:col_pro']}). RGB/RGB-D videos, possibly with Inertial Measurement Unit (IMU) data, are the inputs, while the outputs are the scene layout and CMC 6-DoF poses registered to it.
  • Figure 3: Comparison with SOTAs on CMC 6-DoF pose registration. Since the estimated scene layouts have been aligned to their GTs, the registered CMC 6-DoF poses are directly compared to their GTs. The best results in each metric are highlighted in bold.
  • Figure 4: Comparison of classical KU matching and our spatiotemporal rebalanced matching.
  • ...and 5 more figures