DreamBeast: Distilling 3D Fantastical Animals with Part-Aware Knowledge Transfer
Runjia Li, Junlin Han, Luke Melas-Kyriazi, Chunyi Sun, Zhaochong An, Zhongrui Gui, Shuyang Sun, Philip Torr, Tomas Jakab
TL;DR
DreamBeast addresses the challenge of part level controllability in 3D asset generation by transferring part aware knowledge from a strong 2D diffusion model into SDS. It extracts Part-Affinity maps from multi-view renderings, learns a Part-Affinity NeRF to interpolate those maps to arbitrary views, and modulates cross and self attention during SDS with the learned maps to produce part specific 3D beasts. The approach yields higher part correspondence and image quality while dramatically reducing compute time relative to naive SD3 based pipelines, as demonstrated by CLIP based metrics and user studies. The work advances open world 3D content creation by enabling flexible, part aware composition of 3D assets with practical runtimes.
Abstract
We present DreamBeast, a novel method based on score distillation sampling (SDS) for generating fantastical 3D animal assets composed of distinct parts. Existing SDS methods often struggle with this generation task due to a limited understanding of part-level semantics in text-to-image diffusion models. While recent diffusion models, such as Stable Diffusion 3, demonstrate a better part-level understanding, they are prohibitively slow and exhibit other common problems associated with single-view diffusion models. DreamBeast overcomes this limitation through a novel part-aware knowledge transfer mechanism. For each generated asset, we efficiently extract part-level knowledge from the Stable Diffusion 3 model into a 3D Part-Affinity implicit representation. This enables us to instantly generate Part-Affinity maps from arbitrary camera views, which we then use to modulate the guidance of a multi-view diffusion model during SDS to create 3D assets of fantastical animals. DreamBeast significantly enhances the quality of generated 3D creatures with user-specified part compositions while reducing computational overhead, as demonstrated by extensive quantitative and qualitative evaluations.
