Table of Contents
Fetching ...

CRISTAL: Real-time Camera Registration in Static LiDAR Scans using Neural Rendering

Joni Vanherck, Steven Moonen, Brent Zoomers, Kobe Werner, Jeroen Put, Lode Jorissen, Nick Michiels

TL;DR

CRISTAL addresses drift and scale ambiguity in camera localization by localizing within a pre-captured colored LiDAR map. It introduces a neural point cloud renderer to synthesize photorealistic views from the LiDAR scan and two real-time pipelines: Online Render & Match for immediate relocalization and Prebuild & Localize for offline drift-free mapping compatible with standard SLAM. Evaluations on ScanNet++ and custom datasets demonstrate improved pose accuracy and photometric alignment over traditional SLAM, with the P&L approach enabling drift-free tracking in a global LiDAR frame. The work enables robust, real-time AR and robotics localization in large-scale environments using a single static LiDAR scan, and lays groundwork for dynamic map updates and lighting-variation experiments in future work.

Abstract

Accurate camera localization is crucial for robotics and Extended Reality (XR), enabling reliable navigation and alignment of virtual and real content. Existing visual methods often suffer from drift, scale ambiguity, and depend on fiducials or loop closure. This work introduces a real-time method for localizing a camera within a pre-captured, highly accurate colored LiDAR point cloud. By rendering synthetic views from this cloud, 2D-3D correspondences are established between live frames and the point cloud. A neural rendering technique narrows the domain gap between synthetic and real images, reducing occlusion and background artifacts to improve feature matching. The result is drift-free camera tracking with correct metric scale in the global LiDAR coordinate system. Two real-time variants are presented: Online Render and Match, and Prebuild and Localize. We demonstrate improved results on the ScanNet++ dataset and outperform existing SLAM pipelines.

CRISTAL: Real-time Camera Registration in Static LiDAR Scans using Neural Rendering

TL;DR

CRISTAL addresses drift and scale ambiguity in camera localization by localizing within a pre-captured colored LiDAR map. It introduces a neural point cloud renderer to synthesize photorealistic views from the LiDAR scan and two real-time pipelines: Online Render & Match for immediate relocalization and Prebuild & Localize for offline drift-free mapping compatible with standard SLAM. Evaluations on ScanNet++ and custom datasets demonstrate improved pose accuracy and photometric alignment over traditional SLAM, with the P&L approach enabling drift-free tracking in a global LiDAR frame. The work enables robust, real-time AR and robotics localization in large-scale environments using a single static LiDAR scan, and lays groundwork for dynamic map updates and lighting-variation experiments in future work.

Abstract

Accurate camera localization is crucial for robotics and Extended Reality (XR), enabling reliable navigation and alignment of virtual and real content. Existing visual methods often suffer from drift, scale ambiguity, and depend on fiducials or loop closure. This work introduces a real-time method for localizing a camera within a pre-captured, highly accurate colored LiDAR point cloud. By rendering synthetic views from this cloud, 2D-3D correspondences are established between live frames and the point cloud. A neural rendering technique narrows the domain gap between synthetic and real images, reducing occlusion and background artifacts to improve feature matching. The result is drift-free camera tracking with correct metric scale in the global LiDAR coordinate system. Two real-time variants are presented: Online Render and Match, and Prebuild and Localize. We demonstrate improved results on the ScanNet++ dataset and outperform existing SLAM pipelines.

Paper Structure

This paper contains 14 sections, 4 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Overview of the adapted neural renderer. First, the per-pixel minimum depth is computed using z-buffering. In the second pass, colors of points projecting to the same pixel and within a small depth threshold are accumulated and then averaged. Remaining background leakage is removed via a hierarchical depth filter, and holes are filled using a U-Net.
  • Figure 2: Overview of the proposed online Render & Match (R&M) pipeline. Preprocessing: synthetic cubemap images and depth maps are rendered from uniformly sampled poses within the LiDAR point cloud, using the neural rendering method described in \ref{['sec:neural']}. Features are extracted and back-projected to their 3D coordinates, forming a map database that links 2D descriptors to ground-truth 3D landmarks. Relocalization: when no prior pose is available, query image features are matched to the database to obtain a coarse 6DoF pose via PnP, which is refined by re-rendering the point cloud at the estimated pose. Tracking: during tracking, the previous pose is used to render a synthetic view, which is matched to the live camera frame to estimate the current 6DoF pose. Both stages operate without drift since all correspondences are derived from the LiDAR-based ground-truth geometry.
  • Figure 3: Overview of the proposed Prebuild & Localize (P&L) pipeline. Preprocessing: We generate a set of rendered keyframes and landmarks directly from the LiDAR point cloud, resulting in a compact and drift-free SLAM map created entirely offline. Relocalization/Tracking: the SLAM backend operates on this prebuilt map without modification, enabling real-time, drift-free 6DoF tracking.
  • Figure 4: Each image displays an overlay of the target camera frame and a neural rendering of the point cloud generated from the estimated camera pose. We show results for ScanNet++, the Render & Match approach, and the Prebuild & Localize method. Misalignment, visible as ghosting or blur, suggests pose estimation errors.
  • Figure 5: (a) Intel RealSense D455 setup with attached markers (rigid body) to capture the sequences. (b) Calibration image to determine the offset between the rigid body origin and the optical center of the camera.
  • ...and 4 more figures