Table of Contents
Fetching ...

Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning

Wenqi Dong, Bangbang Yang, Lin Ma, Xiao Liu, Liyuan Cui, Hujun Bao, Yuewen Ma, Zhaopeng Cui

TL;DR

Coin3D addresses the lack of controllable, interactive 3D asset generation by introducing 3D-aware proxy-guided conditioning within a multiview diffusion framework. A 3D adapter integrates voxelized proxy features into the diffusion process, complemented by proxy-bounded editing, progressive volume caching, and volume-SDS for improved reconstruction quality. Empirical results show improved 3D controllability, quicker feedback for interactive edits, and superior reconstruction quality compared with image-based baselines and existing controllable 3D methods. The approach enables practical, view-consistent 3D asset creation with rapid previews and local editing, advancing user-centric generative design in 3D.

Abstract

As humans, we aspire to create media content that is both freely willed and readily controlled. Thanks to the prominent development of generative techniques, we now can easily utilize 2D diffusion methods to synthesize images controlled by raw sketch or designated human poses, and even progressively edit/regenerate local regions with masked inpainting. However, similar workflows in 3D modeling tasks are still unavailable due to the lack of controllability and efficiency in 3D generation. In this paper, we present a novel controllable and interactive 3D assets modeling framework, named Coin3D. Coin3D allows users to control the 3D generation using a coarse geometry proxy assembled from basic shapes, and introduces an interactive generation workflow to support seamless local part editing while delivering responsive 3D object previewing within a few seconds. To this end, we develop several techniques, including the 3D adapter that applies volumetric coarse shape control to the diffusion model, proxy-bounded editing strategy for precise part editing, progressive volume cache to support responsive preview, and volume-SDS to ensure consistent mesh reconstruction. Extensive experiments of interactive generation and editing on diverse shape proxies demonstrate that our method achieves superior controllability and flexibility in the 3D assets generation task.

Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning

TL;DR

Coin3D addresses the lack of controllable, interactive 3D asset generation by introducing 3D-aware proxy-guided conditioning within a multiview diffusion framework. A 3D adapter integrates voxelized proxy features into the diffusion process, complemented by proxy-bounded editing, progressive volume caching, and volume-SDS for improved reconstruction quality. Empirical results show improved 3D controllability, quicker feedback for interactive edits, and superior reconstruction quality compared with image-based baselines and existing controllable 3D methods. The approach enables practical, view-consistent 3D asset creation with rapid previews and local editing, advancing user-centric generative design in 3D.

Abstract

As humans, we aspire to create media content that is both freely willed and readily controlled. Thanks to the prominent development of generative techniques, we now can easily utilize 2D diffusion methods to synthesize images controlled by raw sketch or designated human poses, and even progressively edit/regenerate local regions with masked inpainting. However, similar workflows in 3D modeling tasks are still unavailable due to the lack of controllability and efficiency in 3D generation. In this paper, we present a novel controllable and interactive 3D assets modeling framework, named Coin3D. Coin3D allows users to control the 3D generation using a coarse geometry proxy assembled from basic shapes, and introduces an interactive generation workflow to support seamless local part editing while delivering responsive 3D object previewing within a few seconds. To this end, we develop several techniques, including the 3D adapter that applies volumetric coarse shape control to the diffusion model, proxy-bounded editing strategy for precise part editing, progressive volume cache to support responsive preview, and volume-SDS to ensure consistent mesh reconstruction. Extensive experiments of interactive generation and editing on diverse shape proxies demonstrate that our method achieves superior controllability and flexibility in the 3D assets generation task.
Paper Structure (43 sections, 5 equations, 15 figures, 1 table)

This paper contains 43 sections, 5 equations, 15 figures, 1 table.

Figures (15)

  • Figure 1: Overview. Given a coarse shape proxy and user prompts that describe the identity, our method first constructs 2D image candidates from the proxy's silhouette and 3D proxy samples as input conditions. Then, we employ a 3D adapter to integrate 3D-aware control to the diffusion's denoising process with a 3D control volume $F_C$, yielding multiview images of the object. By fully leveraging $F_C$, we realize accelerated 3D previewing with volume cache and also improve mesh reconstruction quality.
  • Figure 2: Proxy-bounded part editing. We update the 2D image condition and 3D control volume with masks from users' part annotation of the proxy.
  • Figure 3: We compare our proxy-based generation method with image-based methods (i.e., Wonder3D wonder3d and SyncDreamer syncdreamer) on the generated multiview images and reconstructed textured mesh.
  • Figure 4: We compare the Controllable 3D generation with Latent-NeRF latentnerf and Fantasia3D fantasia3d.
  • Figure 5: We conduct interactive generation with part editing on several basic shape proxies.
  • ...and 10 more figures