Table of Contents
Fetching ...

Hybrid Structure-from-Motion and Camera Relocalization for Enhanced Egocentric Localization

Jinjie Mai, Abdullah Hamdi, Silvio Giancola, Chen Zhao, Bernard Ghanem

TL;DR

The paper tackles VQ3D egocentric localization by re-localizing a target object relative to the wearer using a hybrid SfM and 2D-3D matching approach. It introduces EgoLoc-v1, which augments Structure-from-Motion with 2D-3D matches to fetch more camera poses and lifts 2D detections into 3D via $[x,y,z,1]^T = T d K^{-1} [u,v,1]^T$. On public benchmarks, EgoLoc-v1 achieves the best overall success rate, surpassing the prior EgoLoc by $1.5\%$, demonstrating the value of scan-based relocalization for egocentric video. The method highlights a trade-off between improved pose availability and data/speed requirements, motivating further work on scalable scan-assisted localization in real-world settings.

Abstract

We built our pipeline EgoLoc-v1, mainly inspired by EgoLoc. We propose a model ensemble strategy to improve the camera pose estimation part of the VQ3D task, which has been proven to be essential in previous work. The core idea is not only to do SfM for egocentric videos but also to do 2D-3D matching between existing 3D scans and 2D video frames. In this way, we have a hybrid SfM and camera relocalization pipeline, which can provide us with more camera poses, leading to higher QwP and overall success rate. Our method achieves the best performance regarding the most important metric, the overall success rate. We surpass previous state-of-the-art, the competitive EgoLoc, by $1.5\%$. The code is available at \url{https://github.com/Wayne-Mai/egoloc_v1}.

Hybrid Structure-from-Motion and Camera Relocalization for Enhanced Egocentric Localization

TL;DR

The paper tackles VQ3D egocentric localization by re-localizing a target object relative to the wearer using a hybrid SfM and 2D-3D matching approach. It introduces EgoLoc-v1, which augments Structure-from-Motion with 2D-3D matches to fetch more camera poses and lifts 2D detections into 3D via . On public benchmarks, EgoLoc-v1 achieves the best overall success rate, surpassing the prior EgoLoc by , demonstrating the value of scan-based relocalization for egocentric video. The method highlights a trade-off between improved pose availability and data/speed requirements, motivating further work on scalable scan-assisted localization in real-world settings.

Abstract

We built our pipeline EgoLoc-v1, mainly inspired by EgoLoc. We propose a model ensemble strategy to improve the camera pose estimation part of the VQ3D task, which has been proven to be essential in previous work. The core idea is not only to do SfM for egocentric videos but also to do 2D-3D matching between existing 3D scans and 2D video frames. In this way, we have a hybrid SfM and camera relocalization pipeline, which can provide us with more camera poses, leading to higher QwP and overall success rate. Our method achieves the best performance regarding the most important metric, the overall success rate. We surpass previous state-of-the-art, the competitive EgoLoc, by . The code is available at \url{https://github.com/Wayne-Mai/egoloc_v1}.
Paper Structure (7 sections, 1 equation, 1 figure, 1 table)

This paper contains 7 sections, 1 equation, 1 figure, 1 table.

Figures (1)

  • Figure 1: Methodology. We propose a hybrid approach to estimate egocentric camera pose, which leads to more available prediction results for the objects of interest. We assume 3D keypoints are available and utilize them to perform 2D-3D matching and PnP. Then we combine these camera poses and those from EgoLoc SfM together to do the final prediction, i.e., the 3D position $[x,y,z]$ of the retrieved object.