Table of Contents
Fetching ...

LRM-Zero: Training Large Reconstruction Models with Synthesized Data

Desai Xie, Sai Bi, Zhixin Shu, Kai Zhang, Zexiang Xu, Yi Zhou, Sören Pirk, Arie Kaufman, Xin Sun, Hao Tan

TL;DR

LRM-Zero demonstrates that a large reconstruction system can achieve competitive sparse-view 3D reconstructions when trained exclusively on procedurally synthesized data (Zeroverse). By systematically designing augmentations (height-field, boolean difference, wireframe) and carefully balancing training stability, the work shows that synthetic, non-semantic 3D data can learn useful local geometric priors. The approach yields reconstructions on standard benchmarks that approach Objaverse-trained baselines and generalizes to other 3D datasets and NeRF-like methods, suggesting a viable path for data-efficient 3D priors. The study emphasizes a model-data co-design perspective and contributes open-source synthesis tools to foster further research in synthetic-data-driven 3D vision.

Abstract

We present LRM-Zero, a Large Reconstruction Model (LRM) trained entirely on synthesized 3D data, achieving high-quality sparse-view 3D reconstruction. The core of LRM-Zero is our procedural 3D dataset, Zeroverse, which is automatically synthesized from simple primitive shapes with random texturing and augmentations (e.g., height fields, boolean differences, and wireframes). Unlike previous 3D datasets (e.g., Objaverse) which are often captured or crafted by humans to approximate real 3D data, Zeroverse completely ignores realistic global semantics but is rich in complex geometric and texture details that are locally similar to or even more intricate than real objects. We demonstrate that our LRM-Zero, trained with our fully synthesized Zeroverse, can achieve high visual quality in the reconstruction of real-world objects, competitive with models trained on Objaverse. We also analyze several critical design choices of Zeroverse that contribute to LRM-Zero's capability and training stability. Our work demonstrates that 3D reconstruction, one of the core tasks in 3D vision, can potentially be addressed without the semantics of real-world objects. The Zeroverse's procedural synthesis code and interactive visualization are available at: https://desaixie.github.io/lrm-zero/.

LRM-Zero: Training Large Reconstruction Models with Synthesized Data

TL;DR

LRM-Zero demonstrates that a large reconstruction system can achieve competitive sparse-view 3D reconstructions when trained exclusively on procedurally synthesized data (Zeroverse). By systematically designing augmentations (height-field, boolean difference, wireframe) and carefully balancing training stability, the work shows that synthetic, non-semantic 3D data can learn useful local geometric priors. The approach yields reconstructions on standard benchmarks that approach Objaverse-trained baselines and generalizes to other 3D datasets and NeRF-like methods, suggesting a viable path for data-efficient 3D priors. The study emphasizes a model-data co-design perspective and contributes open-source synthesis tools to foster further research in synthetic-data-driven 3D vision.

Abstract

We present LRM-Zero, a Large Reconstruction Model (LRM) trained entirely on synthesized 3D data, achieving high-quality sparse-view 3D reconstruction. The core of LRM-Zero is our procedural 3D dataset, Zeroverse, which is automatically synthesized from simple primitive shapes with random texturing and augmentations (e.g., height fields, boolean differences, and wireframes). Unlike previous 3D datasets (e.g., Objaverse) which are often captured or crafted by humans to approximate real 3D data, Zeroverse completely ignores realistic global semantics but is rich in complex geometric and texture details that are locally similar to or even more intricate than real objects. We demonstrate that our LRM-Zero, trained with our fully synthesized Zeroverse, can achieve high visual quality in the reconstruction of real-world objects, competitive with models trained on Objaverse. We also analyze several critical design choices of Zeroverse that contribute to LRM-Zero's capability and training stability. Our work demonstrates that 3D reconstruction, one of the core tasks in 3D vision, can potentially be addressed without the semantics of real-world objects. The Zeroverse's procedural synthesis code and interactive visualization are available at: https://desaixie.github.io/lrm-zero/.
Paper Structure (28 sections, 1 equation, 8 figures, 12 tables)

This paper contains 28 sections, 1 equation, 8 figures, 12 tables.

Figures (8)

  • Figure 1: We present our LRM-Zero framework trained with synthesized procedural data Zeroverse. Zeroverse (top left) is created from random primitives with textures and augmentations, thus it does not contain semantical information as in Objaverse (bottom left). Nevertheless, when training with the same large reconstruction model architecture zhang2024gslrm on both datasets, LRM-Zero can match objaverse-trained LRM's (denoted as 'LRM') visual quality (right part) of reconstructions. A possible explanation is that 3D reconstruction, although serves as a core task in 3D vision, rely mostly on local information instead of global semantics. Reconstruction is visualized with RGB and position-based renderings, and interactive viewers can be found on our website.
  • Figure 2: Illustration of the Zeroverse data creation process. A random textured shape is first composited from primitive shapes and textures (Sec.\ref{['sec:zeroverse:initial']}). Then different augmentations (i.e., height field, boolean difference, wireframes in Sec . \ref{['sec:zeroverse:augmentations']}) are applied to enhance the dataset characteristics (e.g., curved surfaces, concavity, and thin structures). More visualizations in Appendix and website.
  • Figure 3: Qualitative results generated by LRM-Zero trained on Zeroverse with (left two) and without boolean difference augmentation (right two). Right two LRM-Zero's reconstruction results have structural failures on objects with concave shapes and complex structures.
  • Figure 4: Qualitative results generated by LRM-Zero trained on default Zeroverse with (left two) and without wireframe augmentation (right two). Right two LRM-Zero's reconstruction results have structural failures on objects with thin structures.
  • Figure 5: LRM-Zero's qualitative results on Instant3D text-to-3D (left two) and One2345++ image-to-3D (right two) generated multi-view images.
  • ...and 3 more figures