Table of Contents
Fetching ...

Decomposition of Neural Discrete Representations for Large-Scale 3D Mapping

Minseong Park, Suhan Woo, Euntai Kim

TL;DR

DNMap addresses memory bottlenecks in large-scale outdoor 3D mapping by replacing full continuous embeddings with a decomposition-based discrete representation whose embedding space is factorized into a shared bias embedding $e_b$ and $B$ offsets $\Delta{\mathbf{e}}_j$, composed through binary indicators $\mathbf{b}$ and stored via an octree feature volume. It augments these discrete embeddings with a low-resolution continuous embedding $c$ to provide global location cues, and uses a shallow decoder to predict the signed distance $d=\Phi(\mathbf{x})$, trained with a BCE-based $L_{sdf}$ loss and the Eikonal regularization $L_e$. Experiments on MaiCity and Newer College show substantial storage reductions while preserving or improving mapping quality compared with SHINE-Mapping and VQ-SHINE-Mapping, and DNMap remains trainable under memory budgets where baselines fail. By focusing on learning composition indicators rather than indexing the entire embedding space, DNMap achieves robust, scalable, memory-efficient large-scale 3D maps for outdoor environments.

Abstract

Learning efficient representations of local features is a key challenge in feature volume-based 3D neural mapping, especially in large-scale environments. In this paper, we introduce Decomposition-based Neural Mapping (DNMap), a storage-efficient large-scale 3D mapping method that employs a discrete representation based on a decomposition strategy. This decomposition strategy aims to efficiently capture repetitive and representative patterns of shapes by decomposing each discrete embedding into component vectors that are shared across the embedding space. Our DNMap optimizes a set of component vectors, rather than entire discrete embeddings, and learns composition rather than indexing the discrete embeddings. Furthermore, to complement the mapping quality, we additionally learn low-resolution continuous embeddings that require tiny storage space. By combining these representations with a shallow neural network and an efficient octree-based feature volume, our DNMap successfully approximates signed distance functions and compresses the feature volume while preserving mapping quality. Our source code is available at https://github.com/minseong-p/dnmap.

Decomposition of Neural Discrete Representations for Large-Scale 3D Mapping

TL;DR

DNMap addresses memory bottlenecks in large-scale outdoor 3D mapping by replacing full continuous embeddings with a decomposition-based discrete representation whose embedding space is factorized into a shared bias embedding and offsets , composed through binary indicators and stored via an octree feature volume. It augments these discrete embeddings with a low-resolution continuous embedding to provide global location cues, and uses a shallow decoder to predict the signed distance , trained with a BCE-based loss and the Eikonal regularization . Experiments on MaiCity and Newer College show substantial storage reductions while preserving or improving mapping quality compared with SHINE-Mapping and VQ-SHINE-Mapping, and DNMap remains trainable under memory budgets where baselines fail. By focusing on learning composition indicators rather than indexing the entire embedding space, DNMap achieves robust, scalable, memory-efficient large-scale 3D maps for outdoor environments.

Abstract

Learning efficient representations of local features is a key challenge in feature volume-based 3D neural mapping, especially in large-scale environments. In this paper, we introduce Decomposition-based Neural Mapping (DNMap), a storage-efficient large-scale 3D mapping method that employs a discrete representation based on a decomposition strategy. This decomposition strategy aims to efficiently capture repetitive and representative patterns of shapes by decomposing each discrete embedding into component vectors that are shared across the embedding space. Our DNMap optimizes a set of component vectors, rather than entire discrete embeddings, and learns composition rather than indexing the discrete embeddings. Furthermore, to complement the mapping quality, we additionally learn low-resolution continuous embeddings that require tiny storage space. By combining these representations with a shallow neural network and an efficient octree-based feature volume, our DNMap successfully approximates signed distance functions and compresses the feature volume while preserving mapping quality. Our source code is available at https://github.com/minseong-p/dnmap.
Paper Structure (30 sections, 15 equations, 5 figures, 3 tables)

This paper contains 30 sections, 15 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: The comparison between indexing 10.1145/3528233.3530727 and our proposed decomposition strategy. Indexing requires exploring the entire embedding space ($2^B$) to optimize discrete representations. In contrast, our approach only needs to learn $B$ indicators.
  • Figure 2: The pipeline of our DNMap. When a query point $\mathbf{x}$ is given, the voxels to which the query point belongs are determined at each level of the octree $\mathcal{O}$. For the determined discrete embedding voxels at each level, the voxel corner embeddings are composed from the component vector set $\mathcal{E}$ according to the composition indicators stored in the corners. The query embeddings at each level $\mathbf{z}_0,\cdot\cdot\cdot,\mathbf{z}_{L-1}\mathbf{z}_c$ are determined via trilinear interpolation of the voxel corner embeddings. Finally, the query embedding $\mathbf{z}$ is determined as the sum of all interpolated features, and the decoder $f_\Theta$ takes the query embedding as input and outputs a signed distance value.
  • Figure 3: Comparison between (a) basic implementation and (b) efficient implementation. In most cases during training, (b) is more efficient than (a) because $N_V>>N_Q$, where $N_V$ is the number of total voxels, and $N_Q$ is the number of query points processed at once.
  • Figure 4: Qualitative comparisons on (a) MaiCity vizzo2021icra and (b) Newer College ramezani2020newer datasets. The highlighted areas of the figures are shown below.
  • Figure 5: The reconstruction result of incremental mapping on KITTI kitti odometry dataset sequence 00 using the component vector set and the decoder pre-trained at Newer College dataset ramezani2020newer.