Table of Contents
Fetching ...

Neural Surface Reconstruction from Sparse Views Using Epipolar Geometry

Xinhai Chang, Kaichen Zhou

TL;DR

EpiS tackles the problem of reconstructing neural surfaces from sparse multi-view inputs by explicitly incorporating epipolar geometry into neural surface reconstruction. It replaces reliance on coarse cost-volume statistics with an epipolar transformer that fuses view-dependent information along epipolar lines, followed by ray-wise aggregation to produce SDF-aware features for surface prediction. A geometry regularization strategy leveraging a pretrained monocular depth model with scale-invariant global and local constraints stabilizes fine-tuning under sparse views. Thorough experiments on DTU and BlendedMVS demonstrate superior sparse-view reconstruction and strong generalization without per-scene optimization, highlighting practical impact for low-shot 3D reconstruction scenarios.

Abstract

Reconstructing accurate surfaces from sparse multi-view images remains challenging due to severe geometric ambiguity and occlusions. Existing generalizable neural surface reconstruction methods primarily rely on cost volumes that summarize multi-view features using simple statistics (e.g., mean and variance), which discard critical view-dependent geometric structure and often lead to over-smoothed reconstructions. We propose EpiS, a generalizable neural surface reconstruction framework that explicitly leverages epipolar geometry for sparse-view inputs. Instead of directly regressing geometry from cost-volume statistics, EpiS uses coarse cost-volume features to guide the aggregation of fine-grained epipolar features sampled along corresponding epipolar lines across source views. An epipolar transformer fuses multi-view information, followed by ray-wise aggregation to produce SDF-aware features for surface estimation. To further mitigate information loss under sparse views, we introduce a geometry regularization strategy that leverages a pretrained monocular depth model through scale-invariant global and local constraints. Extensive experiments on DTU and BlendedMVS demonstrate that EpiS significantly outperforms state-of-the-art generalizable surface reconstruction methods under sparse-view settings, while maintaining strong generalization without per-scene optimization.

Neural Surface Reconstruction from Sparse Views Using Epipolar Geometry

TL;DR

EpiS tackles the problem of reconstructing neural surfaces from sparse multi-view inputs by explicitly incorporating epipolar geometry into neural surface reconstruction. It replaces reliance on coarse cost-volume statistics with an epipolar transformer that fuses view-dependent information along epipolar lines, followed by ray-wise aggregation to produce SDF-aware features for surface prediction. A geometry regularization strategy leveraging a pretrained monocular depth model with scale-invariant global and local constraints stabilizes fine-tuning under sparse views. Thorough experiments on DTU and BlendedMVS demonstrate superior sparse-view reconstruction and strong generalization without per-scene optimization, highlighting practical impact for low-shot 3D reconstruction scenarios.

Abstract

Reconstructing accurate surfaces from sparse multi-view images remains challenging due to severe geometric ambiguity and occlusions. Existing generalizable neural surface reconstruction methods primarily rely on cost volumes that summarize multi-view features using simple statistics (e.g., mean and variance), which discard critical view-dependent geometric structure and often lead to over-smoothed reconstructions. We propose EpiS, a generalizable neural surface reconstruction framework that explicitly leverages epipolar geometry for sparse-view inputs. Instead of directly regressing geometry from cost-volume statistics, EpiS uses coarse cost-volume features to guide the aggregation of fine-grained epipolar features sampled along corresponding epipolar lines across source views. An epipolar transformer fuses multi-view information, followed by ray-wise aggregation to produce SDF-aware features for surface estimation. To further mitigate information loss under sparse views, we introduce a geometry regularization strategy that leverages a pretrained monocular depth model through scale-invariant global and local constraints. Extensive experiments on DTU and BlendedMVS demonstrate that EpiS significantly outperforms state-of-the-art generalizable surface reconstruction methods under sparse-view settings, while maintaining strong generalization without per-scene optimization.
Paper Structure (28 sections, 12 equations, 5 figures, 5 tables)

This paper contains 28 sections, 12 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Reconstruction results on the DTU dataset. Our approach has remarkable generalization capabilities across various scenes, successfully reconstructing neural surfaces using only three source images through fast network inference. Notably, the reconstruction quality of our fast inference process surpasses that of SparseNeuS, offering enhanced accuracy and fidelity. Additionally, our results can be further refined through per-scene adjustments. (All meshes are visualized with the help of MeshLab2022).
  • Figure 2: Illustration of the Pipeline. Given a ray in the target view, it is projected onto source views to extract the epipolar feature and distribution feature (variance and mean) using a cost volume. Subsequently, the distribution features are utilized as queries, while the epipolar features serve as keys and values for cross-attention transformers, facilitating cross-view epipolar feature fusion. This fused feature set serves as input for subsequent ray transformers, enabling feature aggregation along the target ray. Finally, the resulting feature is used in the geometry MLP and weight decoder to predict corresponding signed distance functions (SDF) and multi-view color weights.
  • Figure 3: Visualization of Our Fine-Tuning Strategy Designs. On the left, we present the predicted and ground truth depth maps. In the middle, we illustrate the triplet loss. On the right, we showcase the derivative gradients along the X and Y axes of the images.
  • Figure 4: Visualization results on the DTU dataset. EpiS produces precise outcomes without requiring fine-tuning. Moreover, fine-tuning further enhances the realism of our results, which is evident in the comparison.
  • Figure 5: Reconstruction results on the BlendedMVS dataset. EpiS yields reasonably accurate estimation even without pre-training on BlendedMVS. Fine-tuning enhances EpiS's performance, leading to further improvements in accuracy.