Table of Contents
Fetching ...

MSF-Net: Multi-Stage Feature Extraction and Fusion for Robust Photometric Stereo

Shiyu Qin, Zhihao Cai, Kaixuan Wang, Lin Qi, Junyu Dong

TL;DR

MSF-Net tackles photometric stereo by introducing a multi-stage feature extraction framework with an adaptive fusion mechanism and a selective update training strategy. This combination preserves non-maximal features, promotes cross-light interactions, and progressively refines feature quality to improve surface normal estimation under non-Lambertian conditions. Ablation studies and extensive benchmarks on DiLiGenT variants demonstrate strong MAE performance with a compact 2.2M-parameter model, emphasizing efficiency and robustness across challenging materials. The work offers practical implications for accurate 3D surface reconstruction in real-world scenes and points to material-adaptive enhancements as a direction for future improvement.

Abstract

Photometric stereo is a technique aimed at determining surface normals through the utilization of shading cues derived from images taken under different lighting conditions. However, existing learning-based approaches often fail to accurately capture features at multiple stages and do not adequately promote interaction between these features. Consequently, these models tend to extract redundant features, especially in areas with intricate details such as wrinkles and edges. To tackle these issues, we propose MSF-Net, a novel framework for extracting information at multiple stages, paired with selective update strategy, aiming to extract high-quality feature information, which is critical for accurate normal construction. Additionally, we have developed a feature fusion module to improve the interplay among different features. Experimental results on the DiLiGenT benchmark show that our proposed MSF-Net significantly surpasses previous state-of-the-art methods in the accuracy of surface normal estimation.

MSF-Net: Multi-Stage Feature Extraction and Fusion for Robust Photometric Stereo

TL;DR

MSF-Net tackles photometric stereo by introducing a multi-stage feature extraction framework with an adaptive fusion mechanism and a selective update training strategy. This combination preserves non-maximal features, promotes cross-light interactions, and progressively refines feature quality to improve surface normal estimation under non-Lambertian conditions. Ablation studies and extensive benchmarks on DiLiGenT variants demonstrate strong MAE performance with a compact 2.2M-parameter model, emphasizing efficiency and robustness across challenging materials. The work offers practical implications for accurate 3D surface reconstruction in real-world scenes and points to material-adaptive enhancements as a direction for future improvement.

Abstract

Photometric stereo is a technique aimed at determining surface normals through the utilization of shading cues derived from images taken under different lighting conditions. However, existing learning-based approaches often fail to accurately capture features at multiple stages and do not adequately promote interaction between these features. Consequently, these models tend to extract redundant features, especially in areas with intricate details such as wrinkles and edges. To tackle these issues, we propose MSF-Net, a novel framework for extracting information at multiple stages, paired with selective update strategy, aiming to extract high-quality feature information, which is critical for accurate normal construction. Additionally, we have developed a feature fusion module to improve the interplay among different features. Experimental results on the DiLiGenT benchmark show that our proposed MSF-Net significantly surpasses previous state-of-the-art methods in the accuracy of surface normal estimation.

Paper Structure

This paper contains 19 sections, 6 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Network architecture of the proposed MSF-Net. The final normal map is determined solely by the output of the deep extractor.
  • Figure 2: Architecture of the shared-weight multi-stage feature extractor.
  • Figure 3: (a) is the qualitative results at different stages shows that the quality of the predicted normals increases with the stage levels. (b) is the result of our model on DiLiGenT$10^{2}$ Dataset, A number in each element of the matrix indicates an MAE in degrees.
  • Figure 4: Qualitative results for objects "Buddha", "Harvest" and "Reading" on the DiLiGenT benchmark dataset with 96 input images. The number below the error map is the MAE of degrees. By comparing with PS-FCN(N.) psfcn-n_Chen2020DeepPS, MF-PSN mf_psn-Liu_Ju_Jian_Gao_Rao_Hu_Dong_2022, NA-PSN NA_PSN-Ju_Shi_Jian_Qi_Dong_Lam_Kenneth, GR-PSN GR_PSN-10306333, and PX-Net PX_Net-Logothetis_Budvytis_Mecca_Cipolla_2020, our model achieves the best or second-best performance.