WorldMirror: Universal 3D World Reconstruction with Any-Prior Prompting
Yifan Liu, Zhiyuan Min, Zhenwei Wang, Junta Wu, Tengfei Wang, Yixuan Yuan, Yawei Luo, Chunchao Guo
TL;DR
WorldMirror tackles the limitations of task-specific 3D methods by introducing a universal, priors-aware feed-forward model for 3D reconstruction. It uses Multi-Modal Prior Prompting to fuse camera intrinsics, poses, and depth with image tokens, and a Universal Geometric Prediction head to output point maps, depths, camera parameters, normals, and 3D Gaussians, trained with curriculum learning and dynamic prior injection. Across diverse benchmarks including 7-Scenes, DTU, RealEstate10K, ScanNet, and VR-NeRF, it achieves state-of-the-art results while maintaining efficient inference. Ablations show the benefits of compact single-token priors and the necessity of 3DGS supervision and gradient-consistency losses. This framework advances unified 3D scene understanding and offers practical gains for multi-task geometry estimation and novel-view synthesis.
Abstract
We present WorldMirror, an all-in-one, feed-forward model for versatile 3D geometric prediction tasks. Unlike existing methods constrained to image-only inputs or customized for a specific task, our framework flexibly integrates diverse geometric priors, including camera poses, intrinsics, and depth maps, while simultaneously generating multiple 3D representations: dense point clouds, multi-view depth maps, camera parameters, surface normals, and 3D Gaussians. This elegant and unified architecture leverages available prior information to resolve structural ambiguities and delivers geometrically consistent 3D outputs in a single forward pass. WorldMirror achieves state-of-the-art performance across diverse benchmarks from camera, point map, depth, and surface normal estimation to novel view synthesis, while maintaining the efficiency of feed-forward inference. Code and models will be publicly available soon.
