Table of Contents
Fetching ...

Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image

Kailu Wu, Fangfu Liu, Zhihan Cai, Runjie Yan, Hanyang Wang, Yating Hu, Yueqi Duan, Kaisheng Ma

TL;DR

Unique3D addresses the challenge of generating high-fidelity, textured 3D meshes from a single image by integrating four orthographic views from a multi-view diffusion model with a normal-map diffusion path, a multi-level upscaling strategy, and the ISOMER mesh reconstruction algorithm. The approach achieves high geometric and textural detail within 30 seconds on standard GPUs, outperforming prior SDS-based and multi-view reconstruction methods. Key contributions include the ISOMER pipeline with ExplicitTarget to mitigate multi-view inconsistencies and a robust colorization strategy, along with comprehensive ablations and strong quantitative results on the Objaverse-derived dataset. This work advances practical single-image 3D content creation by delivering fast, consistent, and high-quality textured meshes suitable for real-world applications.

Abstract

In this work, we introduce Unique3D, a novel image-to-3D framework for efficiently generating high-quality 3D meshes from single-view images, featuring state-of-the-art generation fidelity and strong generalizability. Previous methods based on Score Distillation Sampling (SDS) can produce diversified 3D results by distilling 3D knowledge from large 2D diffusion models, but they usually suffer from long per-case optimization time with inconsistent issues. Recent works address the problem and generate better 3D results either by finetuning a multi-view diffusion model or training a fast feed-forward model. However, they still lack intricate textures and complex geometries due to inconsistency and limited generated resolution. To simultaneously achieve high fidelity, consistency, and efficiency in single image-to-3D, we propose a novel framework Unique3D that includes a multi-view diffusion model with a corresponding normal diffusion model to generate multi-view images with their normal maps, a multi-level upscale process to progressively improve the resolution of generated orthographic multi-views, as well as an instant and consistent mesh reconstruction algorithm called ISOMER, which fully integrates the color and geometric priors into mesh results. Extensive experiments demonstrate that our Unique3D significantly outperforms other image-to-3D baselines in terms of geometric and textural details.

Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image

TL;DR

Unique3D addresses the challenge of generating high-fidelity, textured 3D meshes from a single image by integrating four orthographic views from a multi-view diffusion model with a normal-map diffusion path, a multi-level upscaling strategy, and the ISOMER mesh reconstruction algorithm. The approach achieves high geometric and textural detail within 30 seconds on standard GPUs, outperforming prior SDS-based and multi-view reconstruction methods. Key contributions include the ISOMER pipeline with ExplicitTarget to mitigate multi-view inconsistencies and a robust colorization strategy, along with comprehensive ablations and strong quantitative results on the Objaverse-derived dataset. This work advances practical single-image 3D content creation by delivering fast, consistent, and high-quality textured meshes suitable for real-world applications.

Abstract

In this work, we introduce Unique3D, a novel image-to-3D framework for efficiently generating high-quality 3D meshes from single-view images, featuring state-of-the-art generation fidelity and strong generalizability. Previous methods based on Score Distillation Sampling (SDS) can produce diversified 3D results by distilling 3D knowledge from large 2D diffusion models, but they usually suffer from long per-case optimization time with inconsistent issues. Recent works address the problem and generate better 3D results either by finetuning a multi-view diffusion model or training a fast feed-forward model. However, they still lack intricate textures and complex geometries due to inconsistency and limited generated resolution. To simultaneously achieve high fidelity, consistency, and efficiency in single image-to-3D, we propose a novel framework Unique3D that includes a multi-view diffusion model with a corresponding normal diffusion model to generate multi-view images with their normal maps, a multi-level upscale process to progressively improve the resolution of generated orthographic multi-views, as well as an instant and consistent mesh reconstruction algorithm called ISOMER, which fully integrates the color and geometric priors into mesh results. Extensive experiments demonstrate that our Unique3D significantly outperforms other image-to-3D baselines in terms of geometric and textural details.
Paper Structure (19 sections, 7 equations, 11 figures, 3 tables, 2 algorithms)

This paper contains 19 sections, 7 equations, 11 figures, 3 tables, 2 algorithms.

Figures (11)

  • Figure 1: Gallery of Unique3D. High-fidelity and diverse textured mesh generated by Unique3D from single-view wild images within 30 seconds. https://wukailu.github.io/Unique3D/.
  • Figure 2: Pipeline of our Unique3D. Given a single in-the-wild image as input, we first generate four orthographic multi-view images from a multi-view diffusion model. Then, we progressively improve the resolution of generated multi-views through a multi-level upscale process. Given generated color images, we train a normal diffusion model to generate normal maps corresponding to multi-view images and utilize a similar strategy to lift it to high-resolution space. Finally, we reconstruct high-quality 3D meshes from high-resolution color images and normal maps with our instant and consistent mesh reconstruction algorithm ISOMER.
  • Figure 3: Qualitative Comparison. Our approach provides superior geometry and texture.
  • Figure 4: Detailed Comparison. We compare our model with InstantMesh xu2024instantmesh, CRM CRM and OpenLRM openlrm. Our models generates accurate geometry and detailed texture.
  • Figure 5: Ablation Study on ISOMER. (a) Without ExplicitTarget, the output mesh result has obvious defects. (b) Without expansion regularization, the output result collapses in some cases.
  • ...and 6 more figures