Table of Contents
Fetching ...

Efficient Construction of Implicit Surface Models From a Single Image for Motion Generation

Wei-Teng Chu, Tianyi Zhang, Matthew Johnson-Roberson, Weiming Zhi

TL;DR

This work proposes Fast Image-to-Neural Surface (FINS), a lightweight framework that can reconstruct high-fidelity surfaces and SDF fields based on a single or a small set of images, and achieves the construction of a neural surface requiring only a single RGB image.

Abstract

Implicit representations have been widely applied in robotics for obstacle avoidance and path planning. In this paper, we explore the problem of constructing an implicit distance representation from a single image. Past methods for implicit surface reconstruction, such as NeuS and its variants generally require a large set of multi-view images as input, and require long training times. In this work, we propose Fast Image-to-Neural Surface (FINS), a lightweight framework that can reconstruct high-fidelity surfaces and SDF fields based on a single or a small set of images. FINS integrates a multi-resolution hash grid encoder with lightweight geometry and color heads, making the training via an approximate second-order optimizer highly efficient and capable of converging within a few seconds. Additionally, we achieve the construction of a neural surface requiring only a single RGB image, by leveraging pre-trained foundation models to estimate the geometry inherent in the image. Our experiments demonstrate that under the same conditions, our method outperforms state-of-the-art baselines in both convergence speed and accuracy on surface reconstruction and SDF field estimation. Moreover, we demonstrate the applicability of FINS for robot surface following tasks and show its scalability to a variety of benchmark datasets.

Efficient Construction of Implicit Surface Models From a Single Image for Motion Generation

TL;DR

This work proposes Fast Image-to-Neural Surface (FINS), a lightweight framework that can reconstruct high-fidelity surfaces and SDF fields based on a single or a small set of images, and achieves the construction of a neural surface requiring only a single RGB image.

Abstract

Implicit representations have been widely applied in robotics for obstacle avoidance and path planning. In this paper, we explore the problem of constructing an implicit distance representation from a single image. Past methods for implicit surface reconstruction, such as NeuS and its variants generally require a large set of multi-view images as input, and require long training times. In this work, we propose Fast Image-to-Neural Surface (FINS), a lightweight framework that can reconstruct high-fidelity surfaces and SDF fields based on a single or a small set of images. FINS integrates a multi-resolution hash grid encoder with lightweight geometry and color heads, making the training via an approximate second-order optimizer highly efficient and capable of converging within a few seconds. Additionally, we achieve the construction of a neural surface requiring only a single RGB image, by leveraging pre-trained foundation models to estimate the geometry inherent in the image. Our experiments demonstrate that under the same conditions, our method outperforms state-of-the-art baselines in both convergence speed and accuracy on surface reconstruction and SDF field estimation. Moreover, we demonstrate the applicability of FINS for robot surface following tasks and show its scalability to a variety of benchmark datasets.

Paper Structure

This paper contains 22 sections, 23 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: We present Fast Image-to-Neural Surface (FINS), an efficient framework ($\sim$10s on consumer-grade hardware) that can reconstruct high-fidelity surfaces and SDF fields based on sparse or even a single image. Top row: Input RGB image of a statue (left), and the corresponding implicit representation enabling robot motion to trace on the surface. Next two rows from left to right: A single image input for SDF field reconstruction; The result mesh; The result colored mesh; The top view of the colored mesh; The trained SDF iso-contours corresponding to the top view.
  • Figure 2: we can leverage 3D foundation models to lift the skull image (shown in \ref{['intro_fig']}) to a 3D point cloud (left), then leverage confidence estimates to further filter and clean the point cloud (right).
  • Figure 3: Qualitative reconstruction results on marching-cube visualized implicit surfaces generated from a single input image from BlendedMVS blendedmvs and DTU DTU datasets. We illustrate the input image, the resulting geometry without color, and that with color.
  • Figure 4: The implicit distance representation produces accurate iso-surfaces, which enable robot surface-tracing motion generation. The robot's motion can be generated by considering the normal and gradient vectors of iso-surfaces of the learned model, tracing the surface of reconstructions of the Statue and Head images. The red line denotes the Franka's end effector path.
  • Figure 5: Removing the Eikonal or Off-surface loss term can lead to a better surface reconstruction quality, which can lead to poor contours off the surface of the representation.
  • ...and 1 more figures