Table of Contents
Fetching ...

Art3D: Training-Free 3D Generation from Flat-Colored Illustration

Xiaoyan Cong, Jiayi Shen, Zekun Li, Rao Fu, Tao Lu, Srinath Sridhar

TL;DR

Art3D tackles the challenge of generating plausible 3D assets from flat-colored illustrations without additional training. It introduces a modular, training-free pipeline that augments flat inputs with 3D priors from pre-trained 2D diffusion models, selects the most realistic proxy via Visual-Language Model reasoning, and then synthesizes a complete shape with Trellis while baking texture with Hunyuan2.0, all guided by the original input. A new Flat-2D dataset is introduced to benchmark generalization, and experiments show Art3D producing complete, textured meshes where prior image-to-3D methods yield degenerate, thin geometries due to distribution gaps. Overall, the approach broadens the practical use of image-to-3D foundations in arts, games, and VR/AR by bridging the gap between flat illustrations and realistic 3D cues.

Abstract

Large-scale pre-trained image-to-3D generative models have exhibited remarkable capabilities in diverse shape generations. However, most of them struggle to synthesize plausible 3D assets when the reference image is flat-colored like hand drawings due to the lack of 3D illusion, which are often the most user-friendly input modalities in art content creation. To this end, we propose Art3D, a training-free method that can lift flat-colored 2D designs into 3D. By leveraging structural and semantic features with pre- trained 2D image generation models and a VLM-based realism evaluation, Art3D successfully enhances the three-dimensional illusion in reference images, thus simplifying the process of generating 3D from 2D, and proves adaptable to a wide range of painting styles. To benchmark the generalization performance of existing image-to-3D models on flat-colored images without 3D feeling, we collect a new dataset, Flat-2D, with over 100 samples. Experimental results demonstrate the performance and robustness of Art3D, exhibiting superior generalizable capacity and promising practical applicability. Our source code and dataset will be publicly available on our project page: https://joy-jy11.github.io/ .

Art3D: Training-Free 3D Generation from Flat-Colored Illustration

TL;DR

Art3D tackles the challenge of generating plausible 3D assets from flat-colored illustrations without additional training. It introduces a modular, training-free pipeline that augments flat inputs with 3D priors from pre-trained 2D diffusion models, selects the most realistic proxy via Visual-Language Model reasoning, and then synthesizes a complete shape with Trellis while baking texture with Hunyuan2.0, all guided by the original input. A new Flat-2D dataset is introduced to benchmark generalization, and experiments show Art3D producing complete, textured meshes where prior image-to-3D methods yield degenerate, thin geometries due to distribution gaps. Overall, the approach broadens the practical use of image-to-3D foundations in arts, games, and VR/AR by bridging the gap between flat illustrations and realistic 3D cues.

Abstract

Large-scale pre-trained image-to-3D generative models have exhibited remarkable capabilities in diverse shape generations. However, most of them struggle to synthesize plausible 3D assets when the reference image is flat-colored like hand drawings due to the lack of 3D illusion, which are often the most user-friendly input modalities in art content creation. To this end, we propose Art3D, a training-free method that can lift flat-colored 2D designs into 3D. By leveraging structural and semantic features with pre- trained 2D image generation models and a VLM-based realism evaluation, Art3D successfully enhances the three-dimensional illusion in reference images, thus simplifying the process of generating 3D from 2D, and proves adaptable to a wide range of painting styles. To benchmark the generalization performance of existing image-to-3D models on flat-colored images without 3D feeling, we collect a new dataset, Flat-2D, with over 100 samples. Experimental results demonstrate the performance and robustness of Art3D, exhibiting superior generalizable capacity and promising practical applicability. Our source code and dataset will be publicly available on our project page: https://joy-jy11.github.io/ .

Paper Structure

This paper contains 10 sections, 3 equations, 4 figures.

Figures (4)

  • Figure 1: Our Art3D creates high-quality 3D assets from a single flat-colored illustration and can be adapted to various drawing styles.
  • Figure 2: We compare the generation results from InstantMesh xu2024instantmeshefficient3dmesh for two kinds of image inputs. The top row presents results obtained from images with a three-dimensional appearance, while the bottom row shows results using flat-colored images as input. We observe that while existing methods perform well on data within their training distribution, they tend to degenerate and produce abnormally thin geometric structures when applied to flat-colored images.
  • Figure 3: Pipeline. Art3D adds 3D illusion to the flat-colored image through ControlNet zhang2023addingconditionalcontroltexttoimage based on the structure features, e.g., canny edge or depth map. The mesh is then synthesized based on the proxy image and textured by baking information from the input image.
  • Figure 4: Qualitative comparisons. We combine our augmentation module with InstantMesh xu2024instantmeshefficient3dmesh and perform comparisons with state-of-the-art pre-trained image-to-3D methods Shap-E jun2023shapegeneratingconditional3d, LN3Diff lan2024ln3diff, InstantMesh xu2024instantmeshefficient3dmesh, 3DTopia-XL chen20243dtopiaxlscalinghighquality3d, LGM tang2024lgm and Trellis xiang2024structured. All the flat-shaped input images are from our curated dataset Flat-2D. Our augmentation module can significantly improve the geometry quality of generated 3D assets.