Table of Contents
Fetching ...

Geometry Meets Light: Leveraging Geometric Priors for Universal Photometric Stereo under Limited Multi-Illumination Cues

King-Man Tam, Satoshi Ikehata, Yuta Asano, Zhaoyi An, Rei Kawakami

TL;DR

GeoUniPS tackles universal photometric stereo under limited multi-illumination cues by leveraging high-level geometric priors from pretrained 3D reconstruction models. It introduces a Light-Geometry Dual-Branch Encoder that fuses illumination-aware features with geometry priors, and a perspective-aware PS-Perp training dataset to bridge perspective distortions. The approach achieves state-of-the-art performance on standard orthographic and perspective benchmarks and demonstrates strong qualitative results in complex in-the-wild scenes, especially when lighting variation is weak. This work highlights the value of large-scale geometric priors as visual-geometry foundation models for PS, enabling robust normal recovery across diverse real-world conditions.

Abstract

Universal Photometric Stereo is a promising approach for recovering surface normals without strict lighting assumptions. However, it struggles when multi-illumination cues are unreliable, such as under biased lighting or in shadows or self-occluded regions of complex in-the-wild scenes. We propose GeoUniPS, a universal photometric stereo network that integrates synthetic supervision with high-level geometric priors from large-scale 3D reconstruction models pretrained on massive in-the-wild data. Our key insight is that these 3D reconstruction models serve as visual-geometry foundation models, inherently encoding rich geometric knowledge of real scenes. To leverage this, we design a Light-Geometry Dual-Branch Encoder that extracts both multi-illumination cues and geometric priors from the frozen 3D reconstruction model. We also address the limitations of the conventional orthographic projection assumption by introducing the PS-Perp dataset with realistic perspective projection to enable learning of spatially varying view directions. Extensive experiments demonstrate that GeoUniPS delivers state-of-the-arts performance across multiple datasets, both quantitatively and qualitatively, especially in the complex in-the-wild scenes.

Geometry Meets Light: Leveraging Geometric Priors for Universal Photometric Stereo under Limited Multi-Illumination Cues

TL;DR

GeoUniPS tackles universal photometric stereo under limited multi-illumination cues by leveraging high-level geometric priors from pretrained 3D reconstruction models. It introduces a Light-Geometry Dual-Branch Encoder that fuses illumination-aware features with geometry priors, and a perspective-aware PS-Perp training dataset to bridge perspective distortions. The approach achieves state-of-the-art performance on standard orthographic and perspective benchmarks and demonstrates strong qualitative results in complex in-the-wild scenes, especially when lighting variation is weak. This work highlights the value of large-scale geometric priors as visual-geometry foundation models for PS, enabling robust normal recovery across diverse real-world conditions.

Abstract

Universal Photometric Stereo is a promising approach for recovering surface normals without strict lighting assumptions. However, it struggles when multi-illumination cues are unreliable, such as under biased lighting or in shadows or self-occluded regions of complex in-the-wild scenes. We propose GeoUniPS, a universal photometric stereo network that integrates synthetic supervision with high-level geometric priors from large-scale 3D reconstruction models pretrained on massive in-the-wild data. Our key insight is that these 3D reconstruction models serve as visual-geometry foundation models, inherently encoding rich geometric knowledge of real scenes. To leverage this, we design a Light-Geometry Dual-Branch Encoder that extracts both multi-illumination cues and geometric priors from the frozen 3D reconstruction model. We also address the limitations of the conventional orthographic projection assumption by introducing the PS-Perp dataset with realistic perspective projection to enable learning of spatially varying view directions. Extensive experiments demonstrate that GeoUniPS delivers state-of-the-arts performance across multiple datasets, both quantitatively and qualitatively, especially in the complex in-the-wild scenes.

Paper Structure

This paper contains 20 sections, 5 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Our method effectively leverages geometric priors from pretrained 3D reconstruction model, achieving more plausible normal map recovery in challenging scenes with complex backgrounds and limited lighting variation. Compared to SoTA monocular normal prediction models (e.g., MoGe-2 wang2025moge2), our approach captures finer surface details by incorporating multi-illumination cues.
  • Figure 2: Overview of our GeoUniPS architecture. Given multiple input images captured under different lighting conditions, the Light-Geometry Dual-Branch Encoder extracts both light-variant features from multi-illumination cues ($\text{Encoder}_\text{IL}$) and geometric features from the pretrained VGGT aggregator ($\text{Encoder}_\text{Geo}$). These features are concatenated with the input images using an MLP-based embedding, after which the Dual-Scale Normal Decoder performs pixel-wise normal regression at sampled locations.
  • Figure 3: Qualitative comparison on Multi-illumination Dataset multiill19.
  • Figure 4: Results for scenes under limited lighting cues, excluding object masks.