MSF-Net: Multi-Stage Feature Extraction and Fusion for Robust Photometric Stereo
Shiyu Qin, Zhihao Cai, Kaixuan Wang, Lin Qi, Junyu Dong
TL;DR
MSF-Net tackles photometric stereo by introducing a multi-stage feature extraction framework with an adaptive fusion mechanism and a selective update training strategy. This combination preserves non-maximal features, promotes cross-light interactions, and progressively refines feature quality to improve surface normal estimation under non-Lambertian conditions. Ablation studies and extensive benchmarks on DiLiGenT variants demonstrate strong MAE performance with a compact 2.2M-parameter model, emphasizing efficiency and robustness across challenging materials. The work offers practical implications for accurate 3D surface reconstruction in real-world scenes and points to material-adaptive enhancements as a direction for future improvement.
Abstract
Photometric stereo is a technique aimed at determining surface normals through the utilization of shading cues derived from images taken under different lighting conditions. However, existing learning-based approaches often fail to accurately capture features at multiple stages and do not adequately promote interaction between these features. Consequently, these models tend to extract redundant features, especially in areas with intricate details such as wrinkles and edges. To tackle these issues, we propose MSF-Net, a novel framework for extracting information at multiple stages, paired with selective update strategy, aiming to extract high-quality feature information, which is critical for accurate normal construction. Additionally, we have developed a feature fusion module to improve the interplay among different features. Experimental results on the DiLiGenT benchmark show that our proposed MSF-Net significantly surpasses previous state-of-the-art methods in the accuracy of surface normal estimation.
