Reflect3r: Single-View 3D Stereo Reconstruction Aided by Mirror Reflections
Jing Wu, Zirui Wang, Iro Laina, Victor Adrian Prisacariu
TL;DR
Reflect3r addresses the challenge of single-view 3D reconstruction in scenes with mirrors by reframing mirror reflections as auxiliary views and constructing a physically valid virtual camera in pixel space. The approach integrates with existing feed-forward multi-view models (e.g., DUSt3R) and introduces a symmetric-aware loss to enforce geometric consistency between real and virtual poses, with an extension to dynamic scenes. A fully synthetic, editable Blender dataset with ground-truth real and virtual poses supports quantitative evaluation. Experiments on real and synthetic data show that Reflect3r achieves higher scene completeness, better accuracy, and lower Chamfer distances than strong baselines, demonstrating the practical value of leveraging mirror-induced stereo cues. The work provides a reusable framework and dataset for robust, low-cost 3D reconstruction in unconstrained environments.
Abstract
Mirror reflections are common in everyday environments and can provide stereo information within a single capture, as the real and reflected virtual views are visible simultaneously. We exploit this property by treating the reflection as an auxiliary view and designing a transformation that constructs a physically valid virtual camera, allowing direct pixel-domain generation of the virtual view while adhering to the real-world imaging process. This enables a multi-view stereo setup from a single image, simplifying the imaging process, making it compatible with powerful feed-forward reconstruction models for generalizable and robust 3D reconstruction. To further exploit the geometric symmetry introduced by mirrors, we propose a symmetric-aware loss to refine pose estimation. Our framework also naturally extends to dynamic scenes, where each frame contains a mirror reflection, enabling efficient per-frame geometry recovery. For quantitative evaluation, we provide a fully customizable synthetic dataset of 16 Blender scenes, each with ground-truth point clouds and camera poses. Extensive experiments on real-world data and synthetic data are conducted to illustrate the effectiveness of our method.
