Table of Contents
Fetching ...

Oriented-grid Encoder for 3D Implicit Representations

Arihant Gaur, G. Dias Pais, Pedro Miraldo

TL;DR

The paper addresses efficient and accurate 3D implicit representations by introducing an oriented-grid encoder that aligns multi-resolution grid cells to surface normals. It combines a dual-tree structure (structured octree and orientation tree) with a cylindrical interpolation scheme and a shared 3D CNN for local feature aggregation, producing rotation-invariant and smoother features. The method achieves state-of-the-art results across ABC, Thingi10k, ShapeNet, and Matterport3D, with faster convergence and sharper surfaces than regular-grid baselines. This approach holds promise for more robust object and scene reconstructions and may extend to neural radiance fields and large-scale scenes.

Abstract

Encoding 3D points is one of the primary steps in learning-based implicit scene representation. Using features that gather information from neighbors with multi-resolution grids has proven to be the best geometric encoder for this task. However, prior techniques do not exploit some characteristics of most objects or scenes, such as surface normals and local smoothness. This paper is the first to exploit those 3D characteristics in 3D geometric encoders explicitly. In contrast to prior work on using multiple levels of details, regular cube grids, and trilinear interpolation, we propose 3D-oriented grids with a novel cylindrical volumetric interpolation for modeling local planar invariance. In addition, we explicitly include a local feature aggregation for feature regularization and smoothing of the cylindrical interpolation features. We evaluate our approach on ABC, Thingi10k, ShapeNet, and Matterport3D, for object and scene representation. Compared to the use of regular grids, our geometric encoder is shown to converge in fewer steps and obtain sharper 3D surfaces. When compared to the prior techniques, our method gets state-of-the-art results.

Oriented-grid Encoder for 3D Implicit Representations

TL;DR

The paper addresses efficient and accurate 3D implicit representations by introducing an oriented-grid encoder that aligns multi-resolution grid cells to surface normals. It combines a dual-tree structure (structured octree and orientation tree) with a cylindrical interpolation scheme and a shared 3D CNN for local feature aggregation, producing rotation-invariant and smoother features. The method achieves state-of-the-art results across ABC, Thingi10k, ShapeNet, and Matterport3D, with faster convergence and sharper surfaces than regular-grid baselines. This approach holds promise for more robust object and scene reconstructions and may extend to neural radiance fields and large-scale scenes.

Abstract

Encoding 3D points is one of the primary steps in learning-based implicit scene representation. Using features that gather information from neighbors with multi-resolution grids has proven to be the best geometric encoder for this task. However, prior techniques do not exploit some characteristics of most objects or scenes, such as surface normals and local smoothness. This paper is the first to exploit those 3D characteristics in 3D geometric encoders explicitly. In contrast to prior work on using multiple levels of details, regular cube grids, and trilinear interpolation, we propose 3D-oriented grids with a novel cylindrical volumetric interpolation for modeling local planar invariance. In addition, we explicitly include a local feature aggregation for feature regularization and smoothing of the cylindrical interpolation features. We evaluate our approach on ABC, Thingi10k, ShapeNet, and Matterport3D, for object and scene representation. Compared to the use of regular grids, our geometric encoder is shown to converge in fewer steps and obtain sharper 3D surfaces. When compared to the prior techniques, our method gets state-of-the-art results.
Paper Structure (25 sections, 2 equations, 15 figures, 5 tables)

This paper contains 25 sections, 2 equations, 15 figures, 5 tables.

Figures (15)

  • Figure 1: Teaser. The proposed multi-resolution oriented grid extends the octree representation with the object's normal directions; cells are rotated to an orientation tree and the respective LOD (left). During training, the feature of a sampled point within the rotated cell is interpolated according to a new cylindrical interpolation scheme; neighboring cell features are aggregated with a 3DCNN. These features can be used in the current state-of-the-art decoders for object representation, such as SDFs and Occupancy. The object is rendered from the implicit representation (right).
  • Figure 2: Oriented Grid Construction. Taking a 2D example with only one DoF ($\theta$), we model the structure of the object by using an octree, as shown in \ref{['oriented_tree_rep1']}. However, for each level, we use an orientation tree that searches for the appropriate rotation/anchor, as shown in \ref{['oriented_tree_rep2']}. From the orientation tree, we obtain the $\theta^l$ for LOD $l$ (as the middle point of limits $\delta^l$) in a coarse-to-fine manner that fits the object's surface. At each level, for each possible action, the query's normal $\mathbf{n}$ produced angle is compared against the angle range of each child. According to the result, we take the appropriate action $\alpha$, shown in bold in \ref{['oriented_tree_rep2']}, to obtain the next level anchor. The cells of the original octree in \ref{['oriented_tree_rep1']} are rotated according to the chosen anchor per level, obtaining the results shown in \ref{['oriented_tree_rep3']}.
  • Figure 3: 3D reconstruction pipeline. Graphical representation of a 3D surface reconstruction pipeline: grid-based and positional encoders are used, followed by a 3D reconstruction module. For each LOD $l$, the query point (in blue) is matched to a cell in the octree. The corresponding cell at each level aligns with the object's surface according to an anchor normal (\ref{['subsec:irregular_grids']}). A local aggregation 3DCNN computes the corresponding feature for each cell while considering its neighborhood (\ref{['subsec:localfeatagg']}). From the proposed cylindrical representation, a feature is interpolated (\ref{['subsec:cylindrical_inter']}) by evaluating the point's position inside the cylinder and the aggregated features. The final object is reconstructed from the interpolated feature, positional encoder, and normal encoder (\ref{['subsec:3drecons']}). Light gray boxes are the learnable layers in this figure, and the dark blocks represent the features.
  • Figure 4: Cylindrical interpolation. The input cell grid has a corresponding anchor's normal $\mathbf{n}_a$ obtained from \ref{['subsec:irregular_grids']}. The cylinder is aligned with the grid normal anchor, with a radius $R$ and height $H$. The interpolation scheme is of volumetric interpolation type. It depends on the distance of the query point $\mathbf{x}$ to the cylinder's height boundaries $h_1$ and $h_2$, and the distance between $\mathbf{x}$ and the cylindrical axis of symmetry, denoted as $r$. The first coefficient $c_0$ is computed from the distance of the point to the top plane $h_1$ and the difference in volumes considering $R$ and the point's distance to the axis of symmetry $r$ (orange). The coefficient $c_2$ is computed from the distance to the bottom plane $h_2$ and the difference in volumes considering $R$ and the point's distance to the axis of symmetry $r$ (green). Finally, $c_1$ is the remainder cylinder (blue). Each coefficient as an associate learnable feature $\mathbf{e}_k$ for $k = \{0, 1, 2\}$. The interpolated feature $\mathbf{f}$ is the weighted average of $\mathbf{e}_k$ with $c_k$ weights.
  • Figure 5: Ablation example. Ablation effects in rendering (numbers in \ref{['tab:ablations']}). \ref{['subfig:trilinear']} represent the oriented encoder with trilinear interpolation; \ref{['subfig:ci']} adds cylindrical interpolation; \ref{['subfig:333cnn']} and \ref{['subfig:555cnn']} use $3 \times 3 \times 3$ and $5 \times 5 \times 5$ 3DCNN kernels for feature aggregation, respectively; and \ref{['subfig:norm']} adds normal regularization to \ref{['subfig:555cnn']}. \ref{['subfig:GT']} shows the ground-truth.
  • ...and 10 more figures