Table of Contents
Fetching ...

Differentiable Product Quantization for Memory Efficient Camera Relocalization

Zakaria Laskar, Iaroslav Melekhov, Assia Benbihi, Shuzhe Wang, Juho Kannala

TL;DR

Memory-heavy 3D scene maps limit camera relocalization deployment. The paper introduces a standalone Differentiable Product Quantization with a tiny scene-specific dequantizer (D-PQED) trained with margin-based metric losses to preserve descriptor matching under extreme compression, and studies its combination with map compression to explore memory-efficiency trade-offs. It demonstrates superior memory-accuracy trade-offs on Aachen Day-Night and other benchmarks, supported by extensive ablations of losses and matchers. The approach enables accurate relocalization at very small memory footprints and informs the design of hybrid map/descriptor compression for real-world systems.

Abstract

Camera relocalization relies on 3D models of the scene with a large memory footprint that is incompatible with the memory budget of several applications. One solution to reduce the scene memory size is map compression by removing certain 3D points and descriptor quantization. This achieves high compression but leads to performance drop due to information loss. To address the memory performance trade-off, we train a light-weight scene-specific auto-encoder network that performs descriptor quantization-dequantization in an end-to-end differentiable manner updating both product quantization centroids and network parameters through back-propagation. In addition to optimizing the network for descriptor reconstruction, we encourage it to preserve the descriptor-matching performance with margin-based metric loss functions. Results show that for a local descriptor memory of only 1MB, the synergistic combination of the proposed network and map compression achieves the best performance on the Aachen Day-Night compared to existing compression methods.

Differentiable Product Quantization for Memory Efficient Camera Relocalization

TL;DR

Memory-heavy 3D scene maps limit camera relocalization deployment. The paper introduces a standalone Differentiable Product Quantization with a tiny scene-specific dequantizer (D-PQED) trained with margin-based metric losses to preserve descriptor matching under extreme compression, and studies its combination with map compression to explore memory-efficiency trade-offs. It demonstrates superior memory-accuracy trade-offs on Aachen Day-Night and other benchmarks, supported by extensive ablations of losses and matchers. The approach enables accurate relocalization at very small memory footprints and informs the design of hybrid map/descriptor compression for real-world systems.

Abstract

Camera relocalization relies on 3D models of the scene with a large memory footprint that is incompatible with the memory budget of several applications. One solution to reduce the scene memory size is map compression by removing certain 3D points and descriptor quantization. This achieves high compression but leads to performance drop due to information loss. To address the memory performance trade-off, we train a light-weight scene-specific auto-encoder network that performs descriptor quantization-dequantization in an end-to-end differentiable manner updating both product quantization centroids and network parameters through back-propagation. In addition to optimizing the network for descriptor reconstruction, we encourage it to preserve the descriptor-matching performance with margin-based metric loss functions. Results show that for a local descriptor memory of only 1MB, the synergistic combination of the proposed network and map compression achieves the best performance on the Aachen Day-Night compared to existing compression methods.
Paper Structure (16 sections, 8 equations, 7 figures, 8 tables, 1 algorithm)

This paper contains 16 sections, 8 equations, 7 figures, 8 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview. In this work we use differentiable Product Quantization to perform memory-efficient camera relocalization. More specifically, a set of local image descriptors extracted from an image is fed into an encoder $\mathcal{E}$ parameterized by the $M$ codebooks that are used to obtain a quantized representation of the input vectors. The quantized descriptors are then passed into the scene-specific differentiable decoder $\mathcal{D}$ that can recover the original descriptors that are susequently used in the localization pipeline. The encoder and decoder together represent a layer called D-PQED.
  • Figure 1: Qualitative results on Aachen: SuperPoint vs. PQ4 descriptors. We evaluate both descriptors on the Aachen Day-Night localization dataset sattler2018benchmarkingSattler2012BMVC and visualize local descriptors correspondences produced by the SuperGlue matcher sarlin2020superglue. Although having significantly low memory consumption, the hard PQ descriptors struggle to provide accurate correspondences leading to weak localization performance. Please zoom in to see the details.
  • Figure 2: Qualitative results: PQ vs. D-PQED descriptors. We evaluate both descriptors on the Aachen Day-Night and Cambridge Landmarks localization datasets sattler2018benchmarkingSattler2012BMVCkendall2015posenet and visualize local descriptors correspondences (in colors) produced by the SuperGlue matcher sarlin2020superglue. Both quantization techniques have a similar memory budget of 4MB. The proposed D-PQED layer provides more accurate correspondences leading to better localization performance.
  • Figure 2: Qualitative results on Aachen: SuperPoint vs. proposed D-PQED descriptors. Similar to Fig. \ref{['sup_fig:superpoint_pq64_aachen']}, both descriptors are evaluated on the Aachen Day-Night localization dataset and the SuperGlue matcher sarlin2020superglue is used to establish 2D-2D inliers. In contrast to its non-differentiable counterpart (cf. Fig. \ref{['sup_fig:superpoint_pq64_aachen']}), the proposed D-PQED layer produces better correspondences.
  • Figure 3: Qualitative results on Cambridge: SuperPoint vs. PQ4 descriptors. Similar to Fig. \ref{['sup_fig:superpoint_pq64_aachen']} and Fig. \ref{['sup_fig:superpoint_nipq64_aachen']}, we evaluate both descriptors on the Cambridge Landmarks localization dataset and use the SuperGlue matcher sarlin2020superglue to establish a 2D-2D set of inliers. The hard PQ descriptors fail to short to produce reliable correspondences. Please zoom in to see the details.
  • ...and 2 more figures