Neural Surface Reconstruction from Sparse Views Using Epipolar Geometry
Xinhai Chang, Kaichen Zhou
TL;DR
EpiS tackles the problem of reconstructing neural surfaces from sparse multi-view inputs by explicitly incorporating epipolar geometry into neural surface reconstruction. It replaces reliance on coarse cost-volume statistics with an epipolar transformer that fuses view-dependent information along epipolar lines, followed by ray-wise aggregation to produce SDF-aware features for surface prediction. A geometry regularization strategy leveraging a pretrained monocular depth model with scale-invariant global and local constraints stabilizes fine-tuning under sparse views. Thorough experiments on DTU and BlendedMVS demonstrate superior sparse-view reconstruction and strong generalization without per-scene optimization, highlighting practical impact for low-shot 3D reconstruction scenarios.
Abstract
Reconstructing accurate surfaces from sparse multi-view images remains challenging due to severe geometric ambiguity and occlusions. Existing generalizable neural surface reconstruction methods primarily rely on cost volumes that summarize multi-view features using simple statistics (e.g., mean and variance), which discard critical view-dependent geometric structure and often lead to over-smoothed reconstructions. We propose EpiS, a generalizable neural surface reconstruction framework that explicitly leverages epipolar geometry for sparse-view inputs. Instead of directly regressing geometry from cost-volume statistics, EpiS uses coarse cost-volume features to guide the aggregation of fine-grained epipolar features sampled along corresponding epipolar lines across source views. An epipolar transformer fuses multi-view information, followed by ray-wise aggregation to produce SDF-aware features for surface estimation. To further mitigate information loss under sparse views, we introduce a geometry regularization strategy that leverages a pretrained monocular depth model through scale-invariant global and local constraints. Extensive experiments on DTU and BlendedMVS demonstrate that EpiS significantly outperforms state-of-the-art generalizable surface reconstruction methods under sparse-view settings, while maintaining strong generalization without per-scene optimization.
