Physics-informed Active Polarimetric 3D Imaging for Specular Surfaces
Jiazhang Wang, Hyelim Yang, Tianyi Wang, Florian Willomitzer
TL;DR
This paper tackles the challenge of fast, accurate single-shot 3D imaging of specular surfaces by introducing a physics-informed deep learning framework that blends polarization priors with geometry. A dual-encoder network with FiLM-based cross-modal modulation fuses Stokes/DoLP-derived cues with a coarse camera–screen correspondence to robustly estimate surface normals from a single shot. Key contributions include a two-stage architecture that (i) derives coarse depth/normals from polarimetric inputs and (ii) adaptively weights geometric information via FiLM to mitigate error propagation, achieving $0.79^\circ$ mean angular error and $8\,\mathrm{ms}$ inference on unseen objects, significantly outperforming conventional polarimetric methods. The approach enables practical, deployment-ready 3D imaging of complex specular surfaces in dynamic environments, demonstrated with a real prototype and synthetic Mitsuba3-based training data, while outlining future work on broader materials and sensor-level modeling.
Abstract
3D imaging of specular surfaces remains challenging in real-world scenarios, such as in-line inspection or hand-held scanning, requiring fast and accurate measurement of complex geometries. Optical metrology techniques such as deflectometry achieve high accuracy but typically rely on multi-shot acquisition, making them unsuitable for dynamic environments. Fourier-based single-shot approaches alleviate this constraint, yet their performance deteriorates when measuring surfaces with high spatial frequency structure or large curvature. Alternatively, polarimetric 3D imaging in computer vision operates in a single-shot fashion and exhibits robustness to geometric complexity. However, its accuracy is fundamentally limited by the orthographic imaging assumption. In this paper, we propose a physics-informed deep learning framework for single-shot 3D imaging of complex specular surfaces. Polarization cues provide orientation priors that assist in interpreting geometric information encoded by structured illumination. These complementary cues are processed through a dual-encoder architecture with mutual feature modulation, allowing the network to resolve their nonlinear coupling and directly infer surface normals. The proposed method achieves accurate and robust normal estimation in single-shot with fast inference, enabling practical 3D imaging of complex specular surfaces.
