Twinner: Shining Light on Digital Twins in a Few Snaps
Jesus Zarzar, Tom Monnier, Roman Shapovalov, Andrea Vedaldi, David Novotny
TL;DR
Twinner addresses digital twinning by enabling relighting and realistic rendering of objects from a few views. It introduces a memory-efficient tricolumn-based large reconstruction model that predicts geometry, PBR textures, and scene illumination, trained with synthetic data and real-world shading supervision. The key contributions include the tricolumn representation, procedurally generated PBR data, and a cubemap illumination predictor that allows learning from real data without ground-truth lighting. Experiments on StanfordORB show Twinner outperforms feed-forward baselines and rivals slow optimization methods in quality while being orders of magnitude faster, enabling practical digital twins.
Abstract
We present the first large reconstruction model, Twinner, capable of recovering a scene's illumination as well as an object's geometry and material properties from only a few posed images. Twinner is based on the Large Reconstruction Model and innovates in three key ways: 1) We introduce a memory-efficient voxel-grid transformer whose memory scales only quadratically with the size of the voxel grid. 2) To deal with scarcity of high-quality ground-truth PBR-shaded models, we introduce a large fully-synthetic dataset of procedurally-generated PBR-textured objects lit with varied illumination. 3) To narrow the synthetic-to-real gap, we finetune the model on real life datasets by means of a differentiable physically-based shading model, eschewing the need for ground-truth illumination or material properties which are challenging to obtain in real life. We demonstrate the efficacy of our model on the real life StanfordORB benchmark where, given few input views, we achieve reconstruction quality significantly superior to existing feedforward reconstruction networks, and comparable to significantly slower per-scene optimization methods.
