SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments

Niklas Gard; Anna Hilsmann; Peter Eisert

SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments

Niklas Gard, Anna Hilsmann, Peter Eisert

TL;DR

SPVLoc tackles indoor 6D localization in unseen environments by cross‑domain matching of a perspective RGB image to semantically annotated panoramas rendered from a minimalist 3D model. A RetinaNet‑style viewport predictor, depth‑wise feature correlation, and a Pose head enable end‑to‑end estimation of the 6DoF pose from the best panorama, with refinement possible via re‑rendered views. The method relies on synthetic semantic panoramas to bridge domain gaps and uses a multi‑task loss with learned uncertainty to balance viewport and pose objectives. Across Structured3D and Zillow Indoor datasets, SPVLoc achieves superior 6DoF localization accuracy with sparse panoramas, generalizes to unseen scenes, and demonstrates practical efficiency suitable for large indoor environments and potential AR applications.

Abstract

In this paper, we present SPVLoc, a global indoor localization method that accurately determines the six-dimensional (6D) camera pose of a query image and requires minimal scene-specific prior knowledge and no scene-specific training. Our approach employs a novel matching procedure to localize the perspective camera's viewport, given as an RGB image, within a set of panoramic semantic layout representations of the indoor environment. The panoramas are rendered from an untextured 3D reference model, which only comprises approximate structural information about room shapes, along with door and window annotations. We demonstrate that a straightforward convolutional network structure can successfully achieve image-to-panorama and ultimately image-to-model matching. Through a viewport classification score, we rank reference panoramas and select the best match for the query image. Then, a 6D relative pose is estimated between the chosen panorama and query image. Our experiments demonstrate that this approach not only efficiently bridges the domain gap but also generalizes well to previously unseen scenes that are not part of the training data. Moreover, it achieves superior localization accuracy compared to the state of the art methods and also estimates more degrees of freedom of the camera pose. Our source code is publicly available at https://fraunhoferhhi.github.io/spvloc .

SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments

TL;DR

Abstract

Paper Structure (45 sections, 1 equation, 11 figures, 8 tables)

This paper contains 45 sections, 1 equation, 11 figures, 8 tables.

Introduction
Related Work
Indoor Camera Localization.
Relative Pose Regression.
Perspective-to-Panorama Matching.
Method
Semantic Panoramic Viewport Matching
Problem Definition.
Viewport Prediction.
Network Architecture.
Perspective Supervision.
Feature-Correlation-based Pose Regression
Optimization
Inference
Refinement.
...and 30 more sections

Figures (11)

Figure 1: SPVLoc for 6D indoor localization. Our method calculates the indoor 6D camera pose by determining the image position and orientation relative to synthetic panoramas. The best panoramic match is found through semantic viewport matching.
Figure 2: Network architecture overview. The information from the query branch is fed into the panorama branch by depth-wise correlation. Three task heads predict the corresponding viewport, one task head predicts the relative pose offset.
Figure 3: The Pose head utilizes convolutions, an MLP, and a FoV-based side input.
Figure 4: ZinD data preparation. Annotations generate 3D reference models (left), while resampled panoramas create perspective train and test images (right).
Figure 5: Qualitative localization results. Top to bottom - query, rendering with top-1 estimated pose, panorama with estimated viewport, map. Green box: success for top-1 match. Yellow box: success for top-2 match. Red box: failure case.
...and 6 more figures

SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments

TL;DR

Abstract

SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments

Authors

TL;DR

Abstract

Table of Contents

Figures (11)