ControlDreamer: Blending Geometry and Style in Text-to-3D
Yeongtak Oh, Jooyoung Choi, Yongsung Kim, Minjun Park, Chaehun Shin, Sungroh Yoon
TL;DR
ControlDreamer addresses the challenge of jointly controlling geometry and style in text-to-3D generation by introducing a two-stage pipeline. It first builds coarse geometry with NeRF from a geometry prompt and then refines a textured mesh via DMTet guided by a depth-aware MV-ControlNet, trained on a large, curated multi-view text dataset. The approach yields superior qualitative and quantitative results, including directional CLIP similarity and human evaluations, compared with existing methods, and establishes a new benchmark for 3D style editing. These advances improve multi-view consistency and enable faithful, text-guided stylization of 3D assets with potential impact on 3D content creation pipelines and tools.
Abstract
Recent advancements in text-to-3D generation have significantly contributed to the automation and democratization of 3D content creation. Building upon these developments, we aim to address the limitations of current methods in blending geometries and styles in text-to-3D generation. We introduce multi-view ControlNet, a novel depth-aware multi-view diffusion model trained on generated datasets from a carefully curated text corpus. Our multi-view ControlNet is then integrated into our two-stage pipeline, ControlDreamer, enabling text-guided generation of stylized 3D models. Additionally, we present a comprehensive benchmark for 3D style editing, encompassing a broad range of subjects, including objects, animals, and characters, to further facilitate research on diverse 3D generation. Our comparative analysis reveals that this new pipeline outperforms existing text-to-3D methods as evidenced by human evaluations and CLIP score metrics. Project page: https://controldreamer.github.io
