Table of Contents
Fetching ...

UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and UnFavOrable Sets

Youngju Na, Woo Jae Kim, Kyu Beom Han, Suhyeon Ha, Sung-eui Yoon

TL;DR

This work addresses generalizable sparse-view surface reconstruction with arbitrary and unfavorable view sets by introducing the VC Score to quantify input-set informativeness and proposing UFORecon, a framework that fuses cross-view matching transformers with cascaded correlation frustums and geometry-aware similarity priors. The approach explicitly models inter-view correlations, leveraging a cross-view transformer, correlation frustums, and a reconstruction transformer to estimate implicit surfaces (SDF) under sparse viewpoints, while training with a random-set strategy to improve generalization. Empirical results on DTU demonstrate superior performance across favorable, normal, and unfavorable VC levels, with ablations showing the contributions of correlation frustums, similarity encoding, and depth supervision. The work promises practical impact for real-world 3D reconstruction where view availability is unpredictable, enabling robust surface geometry without extensive per-scene optimization. $VC$-score-driven evaluation and cross-view priors underpin improved generalizability and robustness in sparse-view reconstruction.

Abstract

Generalizable neural implicit surface reconstruction aims to obtain an accurate underlying geometry given a limited number of multi-view images from unseen scenes. However, existing methods select only informative and relevant views using predefined scores for training and testing phases. This constraint renders the model impractical in real-world scenarios, where the availability of favorable combinations cannot always be ensured. We introduce and validate a view-combination score to indicate the effectiveness of the input view combination. We observe that previous methods output degenerate solutions under arbitrary and unfavorable sets. Building upon this finding, we propose UFORecon, a robust view-combination generalizable surface reconstruction framework. To achieve this, we apply cross-view matching transformers to model interactions between source images and build correlation frustums to capture global correlations. Additionally, we explicitly encode pairwise feature similarities as view-consistent priors. Our proposed framework significantly outperforms previous methods in terms of view-combination generalizability and also in the conventional generalizable protocol trained with favorable view-combinations. The code is available at https://github.com/Youngju-Na/UFORecon.

UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and UnFavOrable Sets

TL;DR

This work addresses generalizable sparse-view surface reconstruction with arbitrary and unfavorable view sets by introducing the VC Score to quantify input-set informativeness and proposing UFORecon, a framework that fuses cross-view matching transformers with cascaded correlation frustums and geometry-aware similarity priors. The approach explicitly models inter-view correlations, leveraging a cross-view transformer, correlation frustums, and a reconstruction transformer to estimate implicit surfaces (SDF) under sparse viewpoints, while training with a random-set strategy to improve generalization. Empirical results on DTU demonstrate superior performance across favorable, normal, and unfavorable VC levels, with ablations showing the contributions of correlation frustums, similarity encoding, and depth supervision. The work promises practical impact for real-world 3D reconstruction where view availability is unpredictable, enabling robust surface geometry without extensive per-scene optimization. -score-driven evaluation and cross-view priors underpin improved generalizability and robustness in sparse-view reconstruction.

Abstract

Generalizable neural implicit surface reconstruction aims to obtain an accurate underlying geometry given a limited number of multi-view images from unseen scenes. However, existing methods select only informative and relevant views using predefined scores for training and testing phases. This constraint renders the model impractical in real-world scenarios, where the availability of favorable combinations cannot always be ensured. We introduce and validate a view-combination score to indicate the effectiveness of the input view combination. We observe that previous methods output degenerate solutions under arbitrary and unfavorable sets. Building upon this finding, we propose UFORecon, a robust view-combination generalizable surface reconstruction framework. To achieve this, we apply cross-view matching transformers to model interactions between source images and build correlation frustums to capture global correlations. Additionally, we explicitly encode pairwise feature similarities as view-consistent priors. Our proposed framework significantly outperforms previous methods in terms of view-combination generalizability and also in the conventional generalizable protocol trained with favorable view-combinations. The code is available at https://github.com/Youngju-Na/UFORecon.
Paper Structure (28 sections, 16 equations, 11 figures, 9 tables)

This paper contains 28 sections, 16 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: Reconstruction results from different view combinations. Both are trained only with the best-selected training protocol and tested with favorable (Blue) and unfavorable sets (Red), respectively. VolRecon volrecon leads to a degenerate geometry in the unfavorable set while achieving accurate geometry in the favorable set. Our approach produces reasonable geometry on both sets.
  • Figure 2: Comparison of Chamfer Distance (CD) by view-combination (VC) scores for generalizable implicit surface reconstruction methods. We define the VC score to represent the informativeness of view combinations in reconstruction. The higher VC score represents a more favorable combination. Our method shows better generalizability and accuracy over VolRecon volrecon across all VC scores. Our random set training (Sec. \ref{['Training']}) further improves the view-combination generalizability.
  • Figure 3: Overall pipeline of UFORecon. Our cross-view matching transformer extracts cross-view matching features from multi-level image features of the given image set (Sec. \ref{['cross-view transformer']}). Cross-view matching features are then represented as 3D volumes (i.e., cascaded correlation frustums, (Sec. \ref{['cross-view volume']}) and as the 2D features (i.e., geometry aware similarity encoding, (Sec. \ref{['similarity encoding']}). Our reconstruction transformer (Sec. \ref{['aggregation transformer']}) fuses various representations of matching features and geometry features (i.e., depth) for volume rendering and color blending.
  • Figure 4: A qualitative results of Surface Reconstruction across various VC Levels. The numbers in parentheses denote the view-combination score. Our method consistently outperforms previous methods at all levels and in different scenes. More qualitative results can be found in the Appendix.
  • Figure 5: Unfavorable 4-views tests on BlendedMVS dataset.
  • ...and 6 more figures