Table of Contents
Fetching ...

FocalDreamer: Text-driven 3D Editing via Focal-fusion Assembly

Yuhan Li, Yishun Dou, Yue Shi, Yu Lei, Xuanhong Chen, Yi Zhang, Peng Zhou, Bingbing Ni

TL;DR

FocalDreamer tackles the challenge of precise, local 3D editing driven by text prompts by decomposing a scene into a fixed base shape and independently learnable parts placed within user-defined focal regions. The method employs a geometry union to fuse editable parts with the base and a dual-path rendering pipeline to separately optimize base and editable textures, guided by score distillation sampling and regularizations that enforce locality and visual coherence. Key contributions include geometric focal loss for localization, collision avoidance, and style consistency regularization, plus a two-stage training regime that yields high-fidelity geometry and PBR textures suitable for standard graphics engines. Extensive experiments and ablations demonstrate superior localized editing performance, strong prompt alignment, and robust base-shape preservation, highlighting its potential to democratize expressive, region-specific 3D content creation.

Abstract

While text-3D editing has made significant strides in leveraging score distillation sampling, emerging approaches still fall short in delivering separable, precise and consistent outcomes that are vital to content creation. In response, we introduce FocalDreamer, a framework that merges base shape with editable parts according to text prompts for fine-grained editing within desired regions. Specifically, equipped with geometry union and dual-path rendering, FocalDreamer assembles independent 3D parts into a complete object, tailored for convenient instance reuse and part-wise control. We propose geometric focal loss and style consistency regularization, which encourage focal fusion and congruent overall appearance. Furthermore, FocalDreamer generates high-fidelity geometry and PBR textures which are compatible with widely-used graphics engines. Extensive experiments have highlighted the superior editing capabilities of FocalDreamer in both quantitative and qualitative evaluations.

FocalDreamer: Text-driven 3D Editing via Focal-fusion Assembly

TL;DR

FocalDreamer tackles the challenge of precise, local 3D editing driven by text prompts by decomposing a scene into a fixed base shape and independently learnable parts placed within user-defined focal regions. The method employs a geometry union to fuse editable parts with the base and a dual-path rendering pipeline to separately optimize base and editable textures, guided by score distillation sampling and regularizations that enforce locality and visual coherence. Key contributions include geometric focal loss for localization, collision avoidance, and style consistency regularization, plus a two-stage training regime that yields high-fidelity geometry and PBR textures suitable for standard graphics engines. Extensive experiments and ablations demonstrate superior localized editing performance, strong prompt alignment, and robust base-shape preservation, highlighting its potential to democratize expressive, region-specific 3D content creation.

Abstract

While text-3D editing has made significant strides in leveraging score distillation sampling, emerging approaches still fall short in delivering separable, precise and consistent outcomes that are vital to content creation. In response, we introduce FocalDreamer, a framework that merges base shape with editable parts according to text prompts for fine-grained editing within desired regions. Specifically, equipped with geometry union and dual-path rendering, FocalDreamer assembles independent 3D parts into a complete object, tailored for convenient instance reuse and part-wise control. We propose geometric focal loss and style consistency regularization, which encourage focal fusion and congruent overall appearance. Furthermore, FocalDreamer generates high-fidelity geometry and PBR textures which are compatible with widely-used graphics engines. Extensive experiments have highlighted the superior editing capabilities of FocalDreamer in both quantitative and qualitative evaluations.
Paper Structure (12 sections, 8 equations, 9 figures, 2 tables)

This paper contains 12 sections, 8 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Given the prompt "a butterfly over a tree stump", our method delivers high-fidelity geometry and photorealistic appearance using PBR materials. Lines (b-c) showcase instance reuse and part-wise material control, underscoring FocalDreamer's capability for separable and precise edits.
  • Figure 2: FocalDreamer can generate meticulously detailed and photo-realistic 3D editing. The left column displays base meshes with focal regions. The three right columns showcase edited overall appearance, assembled geometry, and editable part.
  • Figure 3: An overview of FocalDreamer. (a) During geometry learning, given a base shape, we first initialize an ellipsoid as editable geometry within each focal region. Then we render the normal map of merged shape as shape encoding of pre-trained T2I models, to optimize the editable geometry according to prompts. (b) During appearance learning, resultant shape is rendered in a dual-path manner with base and editable textures. The outcomes are then blended by Pixel-wise Discriminative Mask for a unified appearance. (c) Several regularizations are introduced to improve the editing quality, including $\mathcal{L}_{GF}$, $\mathcal{L}_{CA}$, and $\mathcal{L}_{SC}$.
  • Figure 4: Visual comparison. Our approach synthesizes high-quality edits while preserving the base mesh perfectly.
  • Figure 5: Comparison with SOTA image editing methods. The gray areas in input images indicate the inpainting region specifically added for ControlNet. We observed that 2D editing methods exhibit view-inconsistent, and their editing quality varies notably depending on the viewpoint.
  • ...and 4 more figures