Sketch3D: Style-Consistent Guidance for Sketch-to-3D Generation
Wangguandong Zheng, Haifeng Xia, Rui Chen, Ming Shao, Siyu Xia, Zhengming Ding
TL;DR
Sketch3D tackles the challenge of generating texture-consistent 3D assets from sketches with textual color guidance. It introduces a three-stage pipeline: shape-preserving reference generation, coarse 3D Gaussian initialization, and IP-Adapter guided multi-view optimization with three loss terms (distribution transfer SDS, color $L_2$-norm, and CLIP-based sketch constraint). The approach yields realistic 3D Gaussians aligned to the sketch and colored according to text, achieving around 3 minutes per object and outperforming baselines on CLIP similarity and SSIM on ShapeNet-Sketch3D. This work enables fast, controllable sketch-to-3D asset creation suitable for game engines and design workflows, while noting sensitivity to reference image quality and complexity of sketches.
Abstract
Recently, image-to-3D approaches have achieved significant results with a natural image as input. However, it is not always possible to access these enriched color input samples in practical applications, where only sketches are available. Existing sketch-to-3D researches suffer from limitations in broad applications due to the challenges of lacking color information and multi-view content. To overcome them, this paper proposes a novel generation paradigm Sketch3D to generate realistic 3D assets with shape aligned with the input sketch and color matching the textual description. Concretely, Sketch3D first instantiates the given sketch in the reference image through the shape-preserving generation process. Second, the reference image is leveraged to deduce a coarse 3D Gaussian prior, and multi-view style-consistent guidance images are generated based on the renderings of the 3D Gaussians. Finally, three strategies are designed to optimize 3D Gaussians, i.e., structural optimization via a distribution transfer mechanism, color optimization with a straightforward MSE loss and sketch similarity optimization with a CLIP-based geometric similarity loss. Extensive visual comparisons and quantitative analysis illustrate the advantage of our Sketch3D in generating realistic 3D assets while preserving consistency with the input.
