Large Point-to-Gaussian Model for Image-to-3D Generation
Longfei Lu, Huachen Gao, Tao Dai, Yaohua Zha, Zhi Hou, Junta Wu, Shu-Tao Xia
TL;DR
This work tackles image-to-3D generation by introducing a Point-to-Gaussian framework that converts a geometry-prior point cloud, produced by a large diffusion model conditioned on a single image, into explicit 3D Gaussian parameters for rendering. The core innovation is the APP block (Attention, Projection, Point feature extractor) which fuses 2D image features with 3D point-cloud features across multiple scales, enabling robust cross-modality learning. A multi-scale Gaussian decoder and a point-cloud upsampler produce high-fidelity Gaussians that, when rendered with Gaussian splatting, achieve strong qualitative and quantitative performance on Objaverse and GSO datasets, while training on a relatively small dataset. The method speeds up inference after the initial point-cloud stage and demonstrates strong generalization and view-consistency, highlighting its practical potential for rapid, high-quality 3D asset generation from images.
Abstract
Recently, image-to-3D approaches have significantly advanced the generation quality and speed of 3D assets based on large reconstruction models, particularly 3D Gaussian reconstruction models. Existing large 3D Gaussian models directly map 2D image to 3D Gaussian parameters, while regressing 2D image to 3D Gaussian representations is challenging without 3D priors. In this paper, we propose a large Point-to-Gaussian model, that inputs the initial point cloud produced from large 3D diffusion model conditional on 2D image to generate the Gaussian parameters, for image-to-3D generation. The point cloud provides initial 3D geometry prior for Gaussian generation, thus significantly facilitating image-to-3D Generation. Moreover, we present the \textbf{A}ttention mechanism, \textbf{P}rojection mechanism, and \textbf{P}oint feature extractor, dubbed as \textbf{APP} block, for fusing the image features with point cloud features. The qualitative and quantitative experiments extensively demonstrate the effectiveness of the proposed approach on GSO and Objaverse datasets, and show the proposed method achieves state-of-the-art performance.
