Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects

Tianhang Cheng; Wei-Chiu Ma; Kaiyu Guan; Antonio Torralba; Shenlong Wang

Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects

Tianhang Cheng, Wei-Chiu Ma, Kaiyu Guan, Antonio Torralba, Shenlong Wang

TL;DR

Structure from Duplicates (SfD) tackles inverse rendering from a single image by leveraging duplicates to extract multi-view information. It jointly estimates $6$-DoF poses for all duplicates and uses a shared geometry/material framework to recover geometry, material properties, and environment illumination; the pipeline integrates an in-plane rotation robust SfM step with a neural SDF-based geometry, Disney BRDF, and spherical-Gaussian lighting. The approach demonstrates strong performance on synthetic and real data, surpassing baselines in geometry and material detail, and enables practical tasks like relighting and object insertion from a single view. The work highlights the value of repetition priors for regularizing ill-posed inverse problems and offers insights into efficient multi-view-like reconstruction from a single image.

Abstract

Our world is full of identical objects (\emphe.g., cans of coke, cars of same model). These duplicates, when seen together, provide additional and strong cues for us to effectively reason about 3D. Inspired by this observation, we introduce Structure from Duplicates (SfD), a novel inverse graphics framework that reconstructs geometry, material, and illumination from a single image containing multiple identical objects. SfD begins by identifying multiple instances of an object within an image, and then jointly estimates the 6DoF pose for all instances.An inverse graphics pipeline is subsequently employed to jointly reason about the shape, material of the object, and the environment light, while adhering to the shared geometry and material constraint across instances. Our primary contributions involve utilizing object duplicates as a robust prior for single-image inverse graphics and proposing an in-plane rotation-robust Structure from Motion (SfM) formulation for joint 6-DoF object pose estimation. By leveraging multi-view cues from a single image, SfD generates more realistic and detailed 3D reconstructions, significantly outperforming existing single image reconstruction models and multi-view reconstruction approaches with a similar or greater number of observations.

Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects

TL;DR

Structure from Duplicates (SfD) tackles inverse rendering from a single image by leveraging duplicates to extract multi-view information. It jointly estimates

-DoF poses for all duplicates and uses a shared geometry/material framework to recover geometry, material properties, and environment illumination; the pipeline integrates an in-plane rotation robust SfM step with a neural SDF-based geometry, Disney BRDF, and spherical-Gaussian lighting. The approach demonstrates strong performance on synthetic and real data, surpassing baselines in geometry and material detail, and enables practical tasks like relighting and object insertion from a single view. The work highlights the value of repetition priors for regularizing ill-posed inverse problems and offers insights into efficient multi-view-like reconstruction from a single image.

Abstract

Paper Structure (34 sections, 5 equations, 6 figures, 4 tables)

This paper contains 34 sections, 5 equations, 6 figures, 4 tables.

Introduction
Related Work
Inverse rendering:
3D Reconstruction:
Repetitions:
Structure from Duplicates
Collaborative 6-DoF pose estimation
Caveats of random object poses:
Rotation-aware data augmentation:
Joint shape, material, and illumination estimation
Geometry reconstruction:
Material and illumination model:
Optimization
Geometry optimization:
Visibility optimization:
...and 19 more sections

Figures (6)

Figure 1: Repetitions in the visual world. Our physical world is full of identical objects (e.g., cans of coke, cars of the same model, chairs in a classroom). These duplicates, when seen together, provide additional and strong cues for us to effectively reason about 3D.
Figure 2: Method overview:(Left) SfD begins by identifying multiple instances of an object within an image, and then jointly estimates the 6DoF pose for all instances. (Right) An inverse graphics pipeline is subsequently employed to reason about the shape, material of the object, and the environment light, while adhering to the shared geometry and material constraint across instances.
Figure 3: Multi-view inverse rendering.
Figure 4: Multi-view single object (M-S) vs single-view multi-objects (S-M).
Figure 5: The surface normal and rendering result on real-world cola image. Our model has the smoothest surface normal compared with other baselines.
...and 1 more figures

Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects

TL;DR

Abstract

Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects

Authors

TL;DR

Abstract

Table of Contents

Figures (6)