Table of Contents
Fetching ...

PALMS+: Modular Image-Based Floor Plan Localization Leveraging Depth Foundation Model

Yunqian Cheng, Benjamin Princen, Roberto Manduchi

TL;DR

This work addresses GPS-denied indoor localization by proposing PALMS+, a modular image-based system that uses a depth foundation model to reconstruct scale-correct 3D geometry from a stationary camera scan and then matches this geometry to a floor plan via a kernel-based layout method to produce a posterior heatmap over $SE(2)$ poses. It introduces a scale-alignment step to fix monocular depth ambiguity and employs a two-module pipeline (Observation and Layout Matching) that yields both direct pose estimates and priors for sequential tracking with a particle filter. Key contributions include the use of monocular-depth geometry for long-range observation, robust orientation candidate extraction, and a scale-marginalized heatmap approach, all without model training. Experiments on Structured3D and a custom campus dataset show PALMS+ outperforms PALMS and F3Loc for stationary localization and reduces tracking error in sequential localization, demonstrating a scalable path toward infrastructure-free indoor navigation.

Abstract

Indoor localization in GPS-denied environments is crucial for applications like emergency response and assistive navigation. Vision-based methods such as PALMS enable infrastructure-free localization using only a floor plan and a stationary scan, but are limited by the short range of smartphone LiDAR and ambiguity in indoor layouts. We propose PALMS$+$, a modular, image-based system that addresses these challenges by reconstructing scale-aligned 3D point clouds from posed RGB images using a foundation monocular depth estimation model (Depth Pro), followed by geometric layout matching via convolution with the floor plan. PALMS$+$ outputs a posterior over the location and orientation, usable for direct or sequential localization. Evaluated on the Structured3D and a custom campus dataset consisting of 80 observations across four large campus buildings, PALMS$+$ outperforms PALMS and F3Loc in stationary localization accuracy -- without requiring any training. Furthermore, when integrated with a particle filter for sequential localization on 33 real-world trajectories, PALMS$+$ achieved lower localization errors compared to other methods, demonstrating robustness for camera-free tracking and its potential for infrastructure-free applications. Code and data are available at https://github.com/Head-inthe-Cloud/PALMS-Plane-based-Accessible-Indoor-Localization-Using-Mobile-Smartphones

PALMS+: Modular Image-Based Floor Plan Localization Leveraging Depth Foundation Model

TL;DR

This work addresses GPS-denied indoor localization by proposing PALMS+, a modular image-based system that uses a depth foundation model to reconstruct scale-correct 3D geometry from a stationary camera scan and then matches this geometry to a floor plan via a kernel-based layout method to produce a posterior heatmap over poses. It introduces a scale-alignment step to fix monocular depth ambiguity and employs a two-module pipeline (Observation and Layout Matching) that yields both direct pose estimates and priors for sequential tracking with a particle filter. Key contributions include the use of monocular-depth geometry for long-range observation, robust orientation candidate extraction, and a scale-marginalized heatmap approach, all without model training. Experiments on Structured3D and a custom campus dataset show PALMS+ outperforms PALMS and F3Loc for stationary localization and reduces tracking error in sequential localization, demonstrating a scalable path toward infrastructure-free indoor navigation.

Abstract

Indoor localization in GPS-denied environments is crucial for applications like emergency response and assistive navigation. Vision-based methods such as PALMS enable infrastructure-free localization using only a floor plan and a stationary scan, but are limited by the short range of smartphone LiDAR and ambiguity in indoor layouts. We propose PALMS, a modular, image-based system that addresses these challenges by reconstructing scale-aligned 3D point clouds from posed RGB images using a foundation monocular depth estimation model (Depth Pro), followed by geometric layout matching via convolution with the floor plan. PALMS outputs a posterior over the location and orientation, usable for direct or sequential localization. Evaluated on the Structured3D and a custom campus dataset consisting of 80 observations across four large campus buildings, PALMS outperforms PALMS and F3Loc in stationary localization accuracy -- without requiring any training. Furthermore, when integrated with a particle filter for sequential localization on 33 real-world trajectories, PALMS achieved lower localization errors compared to other methods, demonstrating robustness for camera-free tracking and its potential for infrastructure-free applications. Code and data are available at https://github.com/Head-inthe-Cloud/PALMS-Plane-based-Accessible-Indoor-Localization-Using-Mobile-Smartphones

Paper Structure

This paper contains 24 sections, 4 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: The PALMS+ image-based floor plan localization method.
  • Figure 2: Our point cloud alignment method. (a) Overlap alignment method (top, sub-point clouds used for scale alignment are colored in red) and ground-plane alignment method (bottom, detected ground plane is colored in yellow). (b) An example from our custom dataset, two images on the left show a partial view before and after alignment, and the final aligned point cloud is shown on the right. (c-d) Two examples from the Structured3D dataset, showing the point clouds before (left) and after (right) alignment.
  • Figure 3: Qualitative analysis. We removed the orientation dimension from the heatmaps by taking the max. Each heatmap is then normalized to a PDF. (a) PALMS+'s projected geometry from the reconstructed point cloud. (b) PALMS+ heatmap output. (c) F$^3$Loc heatmap output. (d) PALMS's projected vertical planes extracted by ARKit LiDAR. (e) PALMS heatmap output. (f) Zoomed-in view for the heatmap around the ground-truth location for all three methods. The green circle on the heatmaps indicates the ground-truth location.
  • Figure 4: An example from the S3D dataset (same as \ref{['fig:pcd_alignment']}d), showing the observed line segments (left), and the ambiguous posterior distribution (right), which contains an area of high probability in another room away from the ground-truth.
  • Figure 5: An example of the "perfect observations" sampled from the floor plan (left), and heatmaps obtained using corresponding layout matching methods (right). Top: F$^3$Loc. Bottom: PALMS+. The orange bounding boxes outline the same regions on the left.
  • ...and 6 more figures