Table of Contents
Fetching ...

Diff3DS: Generating View-Consistent 3D Sketch via Differentiable Curve Rendering

Yibo Zhang, Lihong Wang, Changqing Zou, Tieru Wu, Rui Ma

TL;DR

Diff3DS addresses the challenge of generating view-consistent 3D sketches from flexible inputs (text or a single image) by representing the sketch as a set of 3D rational Bézier curve strokes and rendering them through a perspective projection into 2D curves. It introduces a depth-aware differentiable rasterizer that preserves occlusion and color ordering, enabling end-to-end optimization via 2D supervision and Score Distillation Sampling from pretrained diffusion models. Key contributions include (1) a 3D rational Bézier curve representation with a differentiable projection and rasterization pipeline, (2) a differentiable, depth-aware rasterizer that maintains correct depth ordering for colored strokes, and (3) demonstrations of text-to-3D and image-to-3D sketch generation with quantitative and qualitative evidence and comprehensive ablations. The work advances user-friendly, view-consistent 3D sketch generation with potential for broader multimodal 3D content creation and scene-level extension, while noting limitations such as gradient sparsity and fixed initial curve counts.

Abstract

3D sketches are widely used for visually representing the 3D shape and structure of objects or scenes. However, the creation of 3D sketch often requires users to possess professional artistic skills. Existing research efforts primarily focus on enhancing the ability of interactive sketch generation in 3D virtual systems. In this work, we propose Diff3DS, a novel differentiable rendering framework for generating view-consistent 3D sketch by optimizing 3D parametric curves under various supervisions. Specifically, we perform perspective projection to render the 3D rational Bézier curves into 2D curves, which are subsequently converted to a 2D raster image via our customized differentiable rasterizer. Our framework bridges the domains of 3D sketch and raster image, achieving end-toend optimization of 3D sketch through gradients computed in the 2D image domain. Our Diff3DS can enable a series of novel 3D sketch generation tasks, including textto-3D sketch and image-to-3D sketch, supported by the popular distillation-based supervision, such as Score Distillation Sampling (SDS). Extensive experiments have yielded promising results and demonstrated the potential of our framework. Project page is at https://yiboz2001.github.io/Diff3DS/.

Diff3DS: Generating View-Consistent 3D Sketch via Differentiable Curve Rendering

TL;DR

Diff3DS addresses the challenge of generating view-consistent 3D sketches from flexible inputs (text or a single image) by representing the sketch as a set of 3D rational Bézier curve strokes and rendering them through a perspective projection into 2D curves. It introduces a depth-aware differentiable rasterizer that preserves occlusion and color ordering, enabling end-to-end optimization via 2D supervision and Score Distillation Sampling from pretrained diffusion models. Key contributions include (1) a 3D rational Bézier curve representation with a differentiable projection and rasterization pipeline, (2) a differentiable, depth-aware rasterizer that maintains correct depth ordering for colored strokes, and (3) demonstrations of text-to-3D and image-to-3D sketch generation with quantitative and qualitative evidence and comprehensive ablations. The work advances user-friendly, view-consistent 3D sketch generation with potential for broader multimodal 3D content creation and scene-level extension, while noting limitations such as gradient sparsity and fixed initial curve counts.

Abstract

3D sketches are widely used for visually representing the 3D shape and structure of objects or scenes. However, the creation of 3D sketch often requires users to possess professional artistic skills. Existing research efforts primarily focus on enhancing the ability of interactive sketch generation in 3D virtual systems. In this work, we propose Diff3DS, a novel differentiable rendering framework for generating view-consistent 3D sketch by optimizing 3D parametric curves under various supervisions. Specifically, we perform perspective projection to render the 3D rational Bézier curves into 2D curves, which are subsequently converted to a 2D raster image via our customized differentiable rasterizer. Our framework bridges the domains of 3D sketch and raster image, achieving end-toend optimization of 3D sketch through gradients computed in the 2D image domain. Our Diff3DS can enable a series of novel 3D sketch generation tasks, including textto-3D sketch and image-to-3D sketch, supported by the popular distillation-based supervision, such as Score Distillation Sampling (SDS). Extensive experiments have yielded promising results and demonstrated the potential of our framework. Project page is at https://yiboz2001.github.io/Diff3DS/.
Paper Structure (52 sections, 33 equations, 26 figures, 3 tables)

This paper contains 52 sections, 33 equations, 26 figures, 3 tables.

Figures (26)

  • Figure 1: In this paper, we propose Diff3DS, a novel differentiable rendering framework for generating view-consistent 3D sketch from flexible inputs such as a single image or text.
  • Figure 2: A rasterized result contains 3 quadratic rational Bézier curves and a line. All curves share the same control point positions in pixel space and the depths but with different weights. Our rasterizer faithfully renders the curves and maintains occlusions according to the depth order. (e.g., The upper half of each curve has a greater depth than the line, while the lower half has a lesser depth. This difference results in varying color blending outcomes at the overlapping regions).
  • Figure 3: We generate the 3D sketch $\tilde{\Theta}$, represented as a set of 3D strokes, from the text or image input. We render the raster image $I$ from the random camera via our differentiable renderer (\ref{['sec:DiffBC']}). Then, the pre-trained diffusion model, conditioned on the input, diffuses the rendering $I$ and predicts the pseudo ground truth ̂$\hat{I}_0$. The discrepancies between $\hat{I}_0$ and $I$ are used to update the 3D sketch.
  • Figure 4: Pseudo ground truth $\hat{I}_0$ visualization. As the CFG weight $\lambda$ decreases, the effective supervision region of $\hat{I}_0$ also decreases.
  • Figure 5: Qualitative results of the text-to-3D sketch task. Existing text-to-3D methods fail to generate sketch-style results, even with the addition of the "sketch in black and white, line drawing" suffix.
  • ...and 21 more figures