Table of Contents
Fetching ...

3DFIRES: Few Image 3D REconstruction for Scenes with Hidden Surface

Linyi Jin, Nilesh Kulkarni, David Fouhey

TL;DR

3DFIRES is introduced, a novel system for scene-level 3D reconstruction from posed images that matches the efficacy of single-view reconstruction methods with only one input and surpasses existing techniques in both quantitative and qualitative measures for sparse-view 3D reconstruction.

Abstract

This paper introduces 3DFIRES, a novel system for scene-level 3D reconstruction from posed images. Designed to work with as few as one view, 3DFIRES reconstructs the complete geometry of unseen scenes, including hidden surfaces. With multiple view inputs, our method produces full reconstruction within all camera frustums. A key feature of our approach is the fusion of multi-view information at the feature level, enabling the production of coherent and comprehensive 3D reconstruction. We train our system on non-watertight scans from large-scale real scene dataset. We show it matches the efficacy of single-view reconstruction methods with only one input and surpasses existing techniques in both quantitative and qualitative measures for sparse-view 3D reconstruction.

3DFIRES: Few Image 3D REconstruction for Scenes with Hidden Surface

TL;DR

3DFIRES is introduced, a novel system for scene-level 3D reconstruction from posed images that matches the efficacy of single-view reconstruction methods with only one input and surpasses existing techniques in both quantitative and qualitative measures for sparse-view 3D reconstruction.

Abstract

This paper introduces 3DFIRES, a novel system for scene-level 3D reconstruction from posed images. Designed to work with as few as one view, 3DFIRES reconstructs the complete geometry of unseen scenes, including hidden surfaces. With multiple view inputs, our method produces full reconstruction within all camera frustums. A key feature of our approach is the fusion of multi-view information at the feature level, enabling the production of coherent and comprehensive 3D reconstruction. We train our system on non-watertight scans from large-scale real scene dataset. We show it matches the efficacy of single-view reconstruction methods with only one input and surpasses existing techniques in both quantitative and qualitative measures for sparse-view 3D reconstruction.
Paper Structure (22 sections, 7 figures, 7 tables)

This paper contains 22 sections, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Reconstructing 3D from sparsely posed images. Given a sparse set of posed image views, our method is able to reconstruct the full 3D of the scene. On the top, we show two sparse views of the scene in View 1 and View 2. On the bottom left is the 3D reconstruction from our network in the frustum of View 1. We show that our method can generate the occluded side table (zoom in). On the bottom right is the full reconstruction. We color occluded surfaces with surface normals.
  • Figure 2: (a) Architecture for single view DRDF kulkarni2022directed. Given an image and a query pixel location, it predicts DRDF along the ray from the query pixel. (b) we extend (a) to work on sparse views. Middle: Given N images, a query point $\mathbf{x}$, and a query direction $\vec{\mathbf{r}}_q$, we aggregate features from multiple images and output DRDF along the query ray. Right: We show detailed network architecture of 3DFIRES which consists of a Query Encoder and a DRDF Predictor.
  • Figure 3: Predictions in the blue camera frustum. Occluded surfaces are colored with surface normals. A single image to 3D method like DRDF kulkarni2022directed is unable to reconstruct the parts of the scene behind the wall with certainty and hence erroneously adds a full wall in front of the hallway (red box). 3DFIRES which fuses features from multiple views (Green and Purple camera in Fig. \ref{['fig:method']}) predicts empty space for the entrance (black box).
  • Figure 4: Comparison between different methods on held-out test scene. Occluded surfaces are colored with the computed surface normals. "Depth only" leaves holes with sparse input views, e.g. absent floors and walls. Occupancy-based method MCC wu2023mcc produces cloudy results, failing to get the details like pillow, tables. Concatenation of single view DRDF (SV-DRDF) kulkarni2022directed produces inconsistent results, e.g. missing wall in row 2, the double wall in row 3. Our method produces more consistent predictions across different views and also recovers the hidden surface, resulting in a complete mesh. We urge the reader to see results provided in the supplementary videos.
  • Figure 5: Qualitative results on held-out test scenes. Top row: Reconstruction from 3 images and compared with ground truth. Our method can reconstruct a complete scene structure within all the camera frustums, including the occluded surfaces. Bottom row: Predictions from 5 input images compared with ground truth. For the 2nd and 3rd examples, ceilings are removed to reveal the details of the scene.
  • ...and 2 more figures