ShapeGen: Towards High-Quality 3D Shape Synthesis
Yangguang Li, Xianglong He, Zi-Xin Zou, Zexiang Liu, Wanli Ouyang, Ding Liang, Yan-Pei Cao
TL;DR
ShapeGen tackles the challenge of producing artist-grade 3D shapes from a single image by introducing targeted improvements to VAE representations, resolution, conditioning, and inference strategies. It combines BCE-based SDF supervision with occupancy benefits, scales data and token resolution, employs mixed RGB/normal conditioning, and leverages linear attention and inference-time scaling to achieve high-fidelity geometry and image-geometry consistency. Empirical results show state-of-the-art image-to-3D generation performance, with strong generalization on large-scale data and robust qualitative and user-study gains over prior methods. The approach remains efficient (1.5B parameter scale) and practical for real-world 3D pipelines, with avenues for further gains through edge-aware sampling and larger datasets.
Abstract
Inspired by generative paradigms in image and video, 3D shape generation has made notable progress, enabling the rapid synthesis of high-fidelity 3D assets from a single image. However, current methods still face challenges, including the lack of intricate details, overly smoothed surfaces, and fragmented thin-shell structures. These limitations leave the generated 3D assets still one step short of meeting the standards favored by artists. In this paper, we present ShapeGen, which achieves high-quality image-to-3D shape generation through 3D representation and supervision improvements, resolution scaling up, and the advantages of linear transformers. These advancements allow the generated assets to be seamlessly integrated into 3D pipelines, facilitating their widespread adoption across various applications. Through extensive experiments, we validate the impact of these improvements on overall performance. Ultimately, thanks to the synergistic effects of these enhancements, ShapeGen achieves a significant leap in image-to-3D generation, establishing a new state-of-the-art performance.
