C3DAG: Controlled 3D Animal Generation using 3D pose guidance
Sandeep Mishra, Oindrila Saha, Alan C. Bovik
TL;DR
C3DAG tackles the problem of anatomically accurate 3D animal generation from text and pose by introducing a two-stage diffusion-based pipeline. It first initializes a NeRF from a naive balloon-shaped 3D mesh generated from a 3D pose, using depth-guided Score Distillation Sampling, and then refines the model with pose-guided SDS guided by a 2D Tetrapod-pose ControlNet trained on diverse animal keypoints. The approach combines an automatic 3D shape creator with a specialized control network to achieve high-fidelity, pose-consistent 3D animals across mammals, reptiles, birds, and amphibians, while offering substantially faster runtimes than prior state-of-the-art methods. This enables precise, controllable 3D animal generation suitable for animation and rendering, with an accessible web-based tool for interactive pose and shape manipulation.
Abstract
Recent advancements in text-to-3D generation have demonstrated the ability to generate high quality 3D assets. However while generating animals these methods underperform, often portraying inaccurate anatomy and geometry. Towards ameliorating this defect, we present C3DAG, a novel pose-Controlled text-to-3D Animal Generation framework which generates a high quality 3D animal consistent with a given pose. We also introduce an automatic 3D shape creator tool, that allows dynamic pose generation and modification via a web-based tool, and that generates a 3D balloon animal using simple geometries. A NeRF is then initialized using this 3D shape using depth-controlled SDS. In the next stage, the pre-trained NeRF is fine-tuned using quadruped-pose-controlled SDS. The pipeline that we have developed not only produces geometrically and anatomically consistent results, but also renders highly controlled 3D animals, unlike prior methods which do not allow fine-grained pose control.
