Table of Contents
Fetching ...

Sketch3D: Style-Consistent Guidance for Sketch-to-3D Generation

Wangguandong Zheng, Haifeng Xia, Rui Chen, Ming Shao, Siyu Xia, Zhengming Ding

TL;DR

Sketch3D tackles the challenge of generating texture-consistent 3D assets from sketches with textual color guidance. It introduces a three-stage pipeline: shape-preserving reference generation, coarse 3D Gaussian initialization, and IP-Adapter guided multi-view optimization with three loss terms (distribution transfer SDS, color $L_2$-norm, and CLIP-based sketch constraint). The approach yields realistic 3D Gaussians aligned to the sketch and colored according to text, achieving around 3 minutes per object and outperforming baselines on CLIP similarity and SSIM on ShapeNet-Sketch3D. This work enables fast, controllable sketch-to-3D asset creation suitable for game engines and design workflows, while noting sensitivity to reference image quality and complexity of sketches.

Abstract

Recently, image-to-3D approaches have achieved significant results with a natural image as input. However, it is not always possible to access these enriched color input samples in practical applications, where only sketches are available. Existing sketch-to-3D researches suffer from limitations in broad applications due to the challenges of lacking color information and multi-view content. To overcome them, this paper proposes a novel generation paradigm Sketch3D to generate realistic 3D assets with shape aligned with the input sketch and color matching the textual description. Concretely, Sketch3D first instantiates the given sketch in the reference image through the shape-preserving generation process. Second, the reference image is leveraged to deduce a coarse 3D Gaussian prior, and multi-view style-consistent guidance images are generated based on the renderings of the 3D Gaussians. Finally, three strategies are designed to optimize 3D Gaussians, i.e., structural optimization via a distribution transfer mechanism, color optimization with a straightforward MSE loss and sketch similarity optimization with a CLIP-based geometric similarity loss. Extensive visual comparisons and quantitative analysis illustrate the advantage of our Sketch3D in generating realistic 3D assets while preserving consistency with the input.

Sketch3D: Style-Consistent Guidance for Sketch-to-3D Generation

TL;DR

Sketch3D tackles the challenge of generating texture-consistent 3D assets from sketches with textual color guidance. It introduces a three-stage pipeline: shape-preserving reference generation, coarse 3D Gaussian initialization, and IP-Adapter guided multi-view optimization with three loss terms (distribution transfer SDS, color -norm, and CLIP-based sketch constraint). The approach yields realistic 3D Gaussians aligned to the sketch and colored according to text, achieving around 3 minutes per object and outperforming baselines on CLIP similarity and SSIM on ShapeNet-Sketch3D. This work enables fast, controllable sketch-to-3D asset creation suitable for game engines and design workflows, while noting sensitivity to reference image quality and complexity of sketches.

Abstract

Recently, image-to-3D approaches have achieved significant results with a natural image as input. However, it is not always possible to access these enriched color input samples in practical applications, where only sketches are available. Existing sketch-to-3D researches suffer from limitations in broad applications due to the challenges of lacking color information and multi-view content. To overcome them, this paper proposes a novel generation paradigm Sketch3D to generate realistic 3D assets with shape aligned with the input sketch and color matching the textual description. Concretely, Sketch3D first instantiates the given sketch in the reference image through the shape-preserving generation process. Second, the reference image is leveraged to deduce a coarse 3D Gaussian prior, and multi-view style-consistent guidance images are generated based on the renderings of the 3D Gaussians. Finally, three strategies are designed to optimize 3D Gaussians, i.e., structural optimization via a distribution transfer mechanism, color optimization with a straightforward MSE loss and sketch similarity optimization with a CLIP-based geometric similarity loss. Extensive visual comparisons and quantitative analysis illustrate the advantage of our Sketch3D in generating realistic 3D assets while preserving consistency with the input.
Paper Structure (15 sections, 11 equations, 6 figures, 2 tables)

This paper contains 15 sections, 11 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Pipeline of our Sketch3D. Given a sketch image and a text prompt as input, we first generate a reference image $I_\mathrm{ref}$ using ControlNet. Second, we utilize the reference image $I_\mathrm{ref}$ to initialize a coarse 3D prior $M_{0}$, which is represented using 3D Gaussians. Third, we render the 3D Gaussians into images from different viewpoints using a designated camera projection strategy. Based on these, we obtain multi-view style-consistent guidance images through the IP-Adapter. Finally, we formulate three strategies to optimize $M_{0}$: (a) Structural Optimization: a distribution transfer mechanism is proposed for structural optimization, effectively steering the structure generation process towards alignment with the sketch. (b) Color Optimization: based on multi-view style-consistent images, we optimize color with a straightforward MSE loss. (c) Sketch Similarity Optimization: a CLIP-based geometric similarity loss used as a constraint to shape towards the input sketch.
  • Figure 2: For each object, the first row shows content images and the second row shows guidance images. Given reference image $I_\mathrm{ref}$ generated by ControlNet and content images $I_\mathrm{c}$ rendered from the 3D Gaussians, we generate the guidance images $I_\mathrm{g}$ as the multi-view style-consistent images.
  • Figure 3: Qualitative comparisons between our method and Sketch2Model sketch2model, LAS-Diffusion LAS, Shap-E shape-e, One-2-3-45 One-2-3-45 and DreamGaussian dreamgaussian. The input sketches includes sketch images, exterior contour sketches and hand-drawn sketches. Our method achieves the best visual results regarding shape consistency and color generation quality compared to other methods.
  • Figure 4: Ablation study. Two different angles are selected for each object. Red boxes show details.
  • Figure 5: Analytical study of the initialization approach of the 3D Gaussian Representation.
  • ...and 1 more figures