Table of Contents
Fetching ...

Learning Scene-Level Signed Directional Distance Function with Ellipsoidal Priors and Neural Residuals

Zhirui Dai, Hojoon Shin, Yulun Tian, Ki Myung Brian Lee, Nikolay Atanasov

TL;DR

This work introduces a scene-level signed directional distance function (SDDF) and a hybrid explicit-implicit model that combines an ellipsoid-based Prior Network with a neural Residual Network to enable fast, differentiable directional distance queries. The prior provides a coarse, differentiable geometric scaffold, while the Latent Feature Network and Residual Decoder refine fine details, ensuring the SDDF satisfies the directional Eikonal equation by construction: $f(p,v)=f(p,v)+\delta_f$. Empirically, the approach is competitive with state-of-the-art neural implicit scene models in reconstruction accuracy and rendering efficiency on Replica, Gibson, and ScanNet datasets, while enabling differentiable viewpoint optimization for active navigation and exploration. This framework offers a practical pathway toward efficient, differentiable scene representations for robotics and automated exploration.

Abstract

Dense geometric environment representations are critical for autonomous mobile robot navigation and exploration. Recent work shows that implicit continuous representations of occupancy, signed distance, or radiance learned using neural networks offer advantages in reconstruction fidelity, efficiency, and differentiability over explicit discrete representations based on meshes, point clouds, and voxels. In this work, we explore a directional formulation of signed distance, called signed directional distance function (SDDF). Unlike signed distance function (SDF) and similar to neural radiance fields (NeRF), SDDF has a position and viewing direction as input. Like SDF and unlike NeRF, SDDF directly provides distance to the observed surface along the direction, rather than integrating along the view ray, allowing efficient view synthesis. To learn and predict scene-level SDDF efficiently, we develop a differentiable hybrid representation that combines explicit ellipsoid priors and implicit neural residuals. This approach allows the model to effectively handle large distance discontinuities around obstacle boundaries while preserving the ability for dense high-fidelity prediction. We show that SDDF is competitive with the state-of-the-art neural implicit scene models in terms of reconstruction accuracy and rendering efficiency, while allowing differentiable view prediction for robot trajectory optimization.

Learning Scene-Level Signed Directional Distance Function with Ellipsoidal Priors and Neural Residuals

TL;DR

This work introduces a scene-level signed directional distance function (SDDF) and a hybrid explicit-implicit model that combines an ellipsoid-based Prior Network with a neural Residual Network to enable fast, differentiable directional distance queries. The prior provides a coarse, differentiable geometric scaffold, while the Latent Feature Network and Residual Decoder refine fine details, ensuring the SDDF satisfies the directional Eikonal equation by construction: . Empirically, the approach is competitive with state-of-the-art neural implicit scene models in reconstruction accuracy and rendering efficiency on Replica, Gibson, and ScanNet datasets, while enabling differentiable viewpoint optimization for active navigation and exploration. This framework offers a practical pathway toward efficient, differentiable scene representations for robotics and automated exploration.

Abstract

Dense geometric environment representations are critical for autonomous mobile robot navigation and exploration. Recent work shows that implicit continuous representations of occupancy, signed distance, or radiance learned using neural networks offer advantages in reconstruction fidelity, efficiency, and differentiability over explicit discrete representations based on meshes, point clouds, and voxels. In this work, we explore a directional formulation of signed distance, called signed directional distance function (SDDF). Unlike signed distance function (SDF) and similar to neural radiance fields (NeRF), SDDF has a position and viewing direction as input. Like SDF and unlike NeRF, SDDF directly provides distance to the observed surface along the direction, rather than integrating along the view ray, allowing efficient view synthesis. To learn and predict scene-level SDDF efficiently, we develop a differentiable hybrid representation that combines explicit ellipsoid priors and implicit neural residuals. This approach allows the model to effectively handle large distance discontinuities around obstacle boundaries while preserving the ability for dense high-fidelity prediction. We show that SDDF is competitive with the state-of-the-art neural implicit scene models in terms of reconstruction accuracy and rendering efficiency, while allowing differentiable view prediction for robot trajectory optimization.

Paper Structure

This paper contains 33 sections, 4 theorems, 41 equations, 12 figures, 5 tables, 2 algorithms.

Key Result

Proposition 1

Suppose an SDDF $f(\mathbf{p},\mathbf{v};\mathcal{O})$ is differentiable at $\mathbf{p} \in \mathbb{R}^n$. Then, it satisfies a directional Eikonal equation:

Figures (12)

  • Figure 1: (a), (c): We present a method to learn scene-level signed directional distance function (SDDF). (a), (b), (d): Our method uses ellipsoids as an initial coarse approximation of the shapes of objects in the environment. (e), (f): The ellipsoid prior is refined by a latent feature network and a shared decoder to predict the surface reconstruction residual. (f), (g): Our SDDF learning method offers single-query differentiable novel distance image synthesis without RGB supervision as an alternative to Gaussian Splat distance rendering (e.g., RaDe-GS radegs2024) or signed distance function sphere tracing (e.g., InstantNGP instantngp2022).
  • Figure 2: Example of our scene-level SDDF, the object-level SDDF of zobeidi2021, and the DDF of pddf2022 in a 2D synthetic environment. A range sensor (red triangle) with pose $\mathbf{T}_t \in SE(2)$ is measuring the distance to a doughnut-like obstacle $\mathcal{O}$ (black) with a triangular hole in the middle (left plot). At time $t$, the sensor measurement $\mathcal{Z}_t=\{\theta_i,r_{i,t}\}_{i=1}^N$ consists of $N$ range measurements $r_i$ obtained along rays (green lines) cast at angles $\theta_i$. The red arrow in the three plots on the right labels the viewing direction. Unlike DDF, our SDDF definition is continuous when transitioning from free to occupied space along the viewing direction. Compared with object SDDF zobeidi2021, our SDDF definition reflects the geometry well, allowing scene-level reconstruction.
  • Figure 3: Method overview. Given a query ray from position $\mathbf{p}\in\mathbb{R}^3$ in direction $\mathbf{v}\in\mathbb{S}^2$, an ellipsoid-based Prior network $P$ uses $M$ ellipsoids $\{\boldsymbol{\xi}_i,\mathbf{r}_i\}_{i=1}^M$ to learn the rough shape of the environment such that it can determine the closest ellipsoid intersected by the ray and predict an SDDF prior. Then, with the intersection point $\mathbf{q}\in\mathbb{R}^3$ and ray direction $\mathbf{v}'\in\mathbb{S}^2$ in the ellipsoid's local frame, a Latent network $L$ generates a latent feature $\mathbf{z}\in\mathbb{R}^m$, which is decoded by the Residual decoder $R$ into residual predictions $\left(\delta_i,\delta_s,\delta_f\right)$, i.e. the difference between the ground truth and the prior. Finally, we compose the SDDF prediction as $\hat{f}=f+\delta_f$. Blue arrows show the data flow in the forward pass, while red arrows represent the backward pass.
  • Figure 4: 2D visualization of the single ellipsoid SDDF $f(\mathbf{p},\mathbf{v}; \mathcal{E})$ in \ref{['eq:single_ellipsoid_sddf_prior']} for fixed $\mathbf{v}$ and varying $\mathbf{p}$.
  • Figure 5: Comparison of ellipsoid initialization algorithms. The left column is generated by Alg. \ref{['alg:multi_ellipsoid_init']} and the right column is by the K-means++ kmeanspp2007 algorithm only (i.e. L3 to L5 of Alg. \ref{['alg:multi_ellipsoid_init']}). (a) Using Alg. \ref{['alg:multi_ellipsoid_init']}, a few ellipsoids are used to approximate planar surfaces like ceiling, wall, and ground. (b) Using K-means++ kmeanspp2007 only, too many ellipsoids are used to approximate planar surfaces. (c) and (d) show close-ups of the indoor objects. The table (in the green box) and the plant (in the red box) are better approximated by ellipsoids from Alg. \ref{['alg:multi_ellipsoid_init']}.
  • ...and 7 more figures

Theorems & Definitions (9)

  • Definition 1
  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Proposition 3
  • proof
  • Proposition 4
  • proof