Instruct 3D-to-3D: Text Instruction Guided 3D-to-3D conversion
Hiromichi Kamata, Yuiko Sakuma, Akio Hayakawa, Masato Ishii, Takuya Narihira
TL;DR
The paper tackles text-guided 3D-to-3D conversion by leveraging pretrained Image-to-Image diffusion models to ensure viewpoint-consistent, high-quality 3D generation while conditioning on the source scene to preserve structure. It introduces dynamic scaling to adjust the intensity of geometry changes, enabling controllable transformations. Quantitative and qualitative evaluations demonstrate superior results compared with baseline methods. This work advances interactive, controllable editing of 3D scenes guided by natural language instructions, with potential applications in content creation and scene modification.
Abstract
We propose a high-quality 3D-to-3D conversion method, Instruct 3D-to-3D. Our method is designed for a novel task, which is to convert a given 3D scene to another scene according to text instructions. Instruct 3D-to-3D applies pretrained Image-to-Image diffusion models for 3D-to-3D conversion. This enables the likelihood maximization of each viewpoint image and high-quality 3D generation. In addition, our proposed method explicitly inputs the source 3D scene as a condition, which enhances 3D consistency and controllability of how much of the source 3D scene structure is reflected. We also propose dynamic scaling, which allows the intensity of the geometry transformation to be adjusted. We performed quantitative and qualitative evaluations and showed that our proposed method achieves higher quality 3D-to-3D conversions than baseline methods.
