Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Zhenwei Wang, Tengfei Wang, Zexin He, Gerhard Hancke, Ziwei Liu, Rynson W. H. Lau
TL;DR
Phidias addresses the challenge of producing high-quality, generalizable, and controllable 3D content from text, image, or 3D references. It introduces a reference-augmented diffusion framework that conditions a multi-view diffusion model on a 3D reference via canonical coordinate maps (CCMs), augmented by meta-ControlNet, dynamic reference routing, and self-reference augmentation, followed by sparse-view 3D reconstruction. The approach yields a first reference-based 3D-aware diffusion model and demonstrates strong improvements over state-of-the-art baselines across image-to-3D tasks, with versatile applications including text-to-3D, retrieval-augmented generation, and interactive 3D creation. The work offers a unified, controllable pipeline for 3D content generation that leverages external references and self-supervised training to enhance realism and generalization in practical scenarios.
Abstract
In 3D modeling, designers often use an existing 3D model as a reference to create new ones. This practice has inspired the development of Phidias, a novel generative model that uses diffusion for reference-augmented 3D generation. Given an image, our method leverages a retrieved or user-provided 3D reference model to guide the generation process, thereby enhancing the generation quality, generalization ability, and controllability. Our model integrates three key components: 1) meta-ControlNet that dynamically modulates the conditioning strength, 2) dynamic reference routing that mitigates misalignment between the input image and 3D reference, and 3) self-reference augmentations that enable self-supervised training with a progressive curriculum. Collectively, these designs result in a clear improvement over existing methods. Phidias establishes a unified framework for 3D generation using text, image, and 3D conditions with versatile applications.
