Ultraman: Single Image 3D Human Reconstruction with Ultra Speed and Detail
Mingjin Chen, Junhao Chen, Xiaojun Ye, Huan-ang Gao, Xiaoxue Chen, Zhaoxin Fan, Hao Zhao
TL;DR
Ultraman tackles the ill-posed problem of reconstructing textured 3D humans from a single image by integrating a depth-informed mesh reconstruction with a diffusion-based, multi-view texture generation pipeline guided by prompts and viewpoints. A GPT4V-driven VQA module yields detailed prompts, while a view-strategy samples ten viewpoints and a texture-mapping stage projects synthesized views onto a UV mesh with seam-smoothing and region-aware generation masks. The approach uses IP-Adapter and ControlNet to achieve view- and depth-consistent texture generation, followed by a seam-aware texturing process that preserves frontal details while enriching back and side textures. Empirical results show Ultraman dramatically reduces inference time (about 20–30 minutes) and outperforms state-of-the-art methods in both geometry and appearance across standard datasets, with strong user-pref consistency, enabling fast production of high-fidelity digital humans for AR/VR and related applications.
Abstract
3D human body reconstruction has been a challenge in the field of computer vision. Previous methods are often time-consuming and difficult to capture the detailed appearance of the human body. In this paper, we propose a new method called \emph{Ultraman} for fast reconstruction of textured 3D human models from a single image. Compared to existing techniques, \emph{Ultraman} greatly improves the reconstruction speed and accuracy while preserving high-quality texture details. We present a set of new frameworks for human reconstruction consisting of three parts, geometric reconstruction, texture generation and texture mapping. Firstly, a mesh reconstruction framework is used, which accurately extracts 3D human shapes from a single image. At the same time, we propose a method to generate a multi-view consistent image of the human body based on a single image. This is finally combined with a novel texture mapping method to optimize texture details and ensure color consistency during reconstruction. Through extensive experiments and evaluations, we demonstrate the superior performance of \emph{Ultraman} on various standard datasets. In addition, \emph{Ultraman} outperforms state-of-the-art methods in terms of human rendering quality and speed. Upon acceptance of the article, we will make the code and data publicly available.
