Table of Contents
Fetching ...

Magic3DSketch: Create Colorful 3D Models From Sketch-Based 3D Modeling Guided by Text and Language-Image Pre-Training

Ying Zang, Yidong Han, Chaotao Ding, Jianqi Zhang, Tianrun Chen

TL;DR

Magic3DSketch tackles the barrier of expert-level 3D modeling by enabling colorized 3D mesh generation from a single sketch and a text prompt. It combines an encoder–decoder mesh generator with CLIP-based supervision, including multi-view losses and a viewpoint predictor, to produce faithful geometry with controllable structure. A two-stage, CLIP-guided stylization pipeline adds texture and color based on text prompts, yielding colored 3D models from sketch inputs. On ShapeNet-Synthetic and ShapeNet-Sketch, the approach achieves state-of-the-art results, runs in real time (>100 FPS), and is favored by users for its controllability and satisfaction, highlighting its practical potential for rapid AR/VR content creation and design pipelines.

Abstract

The requirement for 3D content is growing as AR/VR application emerges. At the same time, 3D modelling is only available for skillful experts, because traditional methods like Computer-Aided Design (CAD) are often too labor-intensive and skill-demanding, making it challenging for novice users. Our proposed method, Magic3DSketch, employs a novel technique that encodes sketches to predict a 3D mesh, guided by text descriptions and leveraging external prior knowledge obtained through text and language-image pre-training. The integration of language-image pre-trained neural networks complements the sparse and ambiguous nature of single-view sketch inputs. Our method is also more useful and offers higher degree of controllability compared to existing text-to-3D approaches, according to our user study. Moreover, Magic3DSketch achieves state-of-the-art performance in both synthetic and real dataset with the capability of producing more detailed structures and realistic shapes with the help of text input. Users are also more satisfied with models obtained by Magic3DSketch according to our user study. Additionally, we are also the first, to our knowledge, add color based on text description to the sketch-derived shapes. By combining sketches and text guidance with the help of language-image pretrained models, our Magic3DSketch can allow novice users to create custom 3D models with minimal effort and maximum creative freedom, with the potential to revolutionize future 3D modeling pipelines.

Magic3DSketch: Create Colorful 3D Models From Sketch-Based 3D Modeling Guided by Text and Language-Image Pre-Training

TL;DR

Magic3DSketch tackles the barrier of expert-level 3D modeling by enabling colorized 3D mesh generation from a single sketch and a text prompt. It combines an encoder–decoder mesh generator with CLIP-based supervision, including multi-view losses and a viewpoint predictor, to produce faithful geometry with controllable structure. A two-stage, CLIP-guided stylization pipeline adds texture and color based on text prompts, yielding colored 3D models from sketch inputs. On ShapeNet-Synthetic and ShapeNet-Sketch, the approach achieves state-of-the-art results, runs in real time (>100 FPS), and is favored by users for its controllability and satisfaction, highlighting its practical potential for rapid AR/VR content creation and design pipelines.

Abstract

The requirement for 3D content is growing as AR/VR application emerges. At the same time, 3D modelling is only available for skillful experts, because traditional methods like Computer-Aided Design (CAD) are often too labor-intensive and skill-demanding, making it challenging for novice users. Our proposed method, Magic3DSketch, employs a novel technique that encodes sketches to predict a 3D mesh, guided by text descriptions and leveraging external prior knowledge obtained through text and language-image pre-training. The integration of language-image pre-trained neural networks complements the sparse and ambiguous nature of single-view sketch inputs. Our method is also more useful and offers higher degree of controllability compared to existing text-to-3D approaches, according to our user study. Moreover, Magic3DSketch achieves state-of-the-art performance in both synthetic and real dataset with the capability of producing more detailed structures and realistic shapes with the help of text input. Users are also more satisfied with models obtained by Magic3DSketch according to our user study. Additionally, we are also the first, to our knowledge, add color based on text description to the sketch-derived shapes. By combining sketches and text guidance with the help of language-image pretrained models, our Magic3DSketch can allow novice users to create custom 3D models with minimal effort and maximum creative freedom, with the potential to revolutionize future 3D modeling pipelines.
Paper Structure (23 sections, 6 equations, 9 figures, 8 tables)

This paper contains 23 sections, 6 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Magic3DSketch utilizes a single sketch as input, along with a text prompt, to generate a high-fidelity and realistic 3D mesh complete with colors.
  • Figure 2: The Pipeline of Our Magic3DSketch. The Magic3DSketch takes a single-view sketch and text prompts to produce high-quality 3D objects. The neural network components in green background are designed to be generalized and trained end-to-end. The components in orange background are optimized on a per-object basis.
  • Figure 3: Qualitative evaluation with existing state-of-the-art sketch-to-model approaches. The visualization of 3D models generated demonstrated that our method is capable of synthesizing higher fidelity and more realistic 3D models.
  • Figure 4: Representative results on our ShapeNet-Sketch dataset.In comparison to alternative methods, Magic3DSketch boasts the capability to create more advanced and promising shapes using hand-drawn sketches
  • Figure 5: The visualization of 3D models generated demonstrated that our method is capable of synthesizing high quality 3D objects with shapes from sketches and text prompts
  • ...and 4 more figures