Adaptive Surface Normal Constraint for Geometric Estimation from Monocular Images
Xiaoxiao Long, Yuhang Zheng, Yupeng Zheng, Beiwen Tian, Cheng Lin, Lingjie Liu, Hao Zhao, Guyue Zhou, Wenping Wang
TL;DR
This work addresses monocular depth and surface-normal estimation by introducing an Adaptive Surface Normal (ASN) constraint that jointly enforces depth-normal consistency through a learned geometric context. The method samples local triplets to generate multiple normal candidates, then adaptively weighs them using a geometry-aware confidence and area-based factors, while a geometric context-guided normal estimator refines normals in detail-rich regions. A transformer-based network with depth, guidance, and normal decoders learns to predict coherent 3D structure and high-fidelity point clouds across indoor and outdoor datasets, outperforming state-of-the-art methods on depth, normals, and 3D geometry metrics. The approach offers robust, efficient 3D reconstruction from monocular images and highlights the value of explicit geometric context in guiding both depth and normal estimation for practical applications in 3D vision and robotics.
Abstract
We introduce a novel approach to learn geometries such as depth and surface normal from images while incorporating geometric context. The difficulty of reliably capturing geometric context in existing methods impedes their ability to accurately enforce the consistency between the different geometric properties, thereby leading to a bottleneck of geometric estimation quality. We therefore propose the Adaptive Surface Normal (ASN) constraint, a simple yet efficient method. Our approach extracts geometric context that encodes the geometric variations present in the input image and correlates depth estimation with geometric constraints. By dynamically determining reliable local geometry from randomly sampled candidates, we establish a surface normal constraint, where the validity of these candidates is evaluated using the geometric context. Furthermore, our normal estimation leverages the geometric context to prioritize regions that exhibit significant geometric variations, which makes the predicted normals accurately capture intricate and detailed geometric information. Through the integration of geometric context, our method unifies depth and surface normal estimations within a cohesive framework, which enables the generation of high-quality 3D geometry from images. We validate the superiority of our approach over state-of-the-art methods through extensive evaluations and comparisons on diverse indoor and outdoor datasets, showcasing its efficiency and robustness.
