Table of Contents
Fetching ...

Plasticine3D: 3D Non-Rigid Editing with Text Guidance by Multi-View Embedding Optimization

Yige Chen, Teng Hu, Yizhe Tang, Siyuan Chen, Ang Chen, Ran Yi

TL;DR

Plasticine3D tackles the challenge of text-guided 3D non-rigid editing with large structure changes by decoupling geometry and texture editing and introducing three key innovations: Multi-View-Embedding (MVE) Optimization to capture cross-view object features, Embedding-Fusion (EF) for controllable editing strength, and Score Projection Sampling (SPS) to stabilize large deformations while preserving detail. A supplementary multi-view normal-depth diffusion guidance further enforces geometric consistency. In experiments against Vox-E and DreamBooth3D baselines, Plasticine3D achieves superior editing accuracy and deformation capability, validated by higher CLIP_sim and CLIP_dir scores and qualitative results. The work offers a practical, fine-grained framework for editing 3D assets with text prompts, enabling more flexible asset customization in applications like games and virtual environments.

Abstract

With the help of Score Distillation Sampling (SDS) and the rapid development of neural 3D representations, some methods have been proposed to perform 3D editing such as adding additional geometries, or overwriting textures. However, generalized 3D non-rigid editing task, which requires changing both the structure (posture or composition) and appearance (texture) of the original object, remains to be challenging in 3D editing field. In this paper, we propose Plasticine3D, a novel text-guided fine-grained controlled 3D editing pipeline that can perform 3D non-rigid editing with large structure deformations. Our work divides the editing process into a geometry editing stage and a texture editing stage to achieve separate control of structure and appearance. In order to maintain the details of the original object from different viewpoints, we propose a Multi-View-Embedding (MVE) Optimization strategy to ensure that the guidance model learns the features of the original object from various viewpoints. For the purpose of fine-grained control, we propose Embedding-Fusion (EF) to blend the original characteristics with the editing objectives in the embedding space, and control the extent of editing by adjusting the fusion rate. Furthermore, in order to address the issue of gradual loss of details during the generation process under high editing intensity, as well as the problem of insignificant editing effects in some scenarios, we propose Score Projection Sampling (SPS) as a replacement of score distillation sampling, which introduces additional optimization phases for editing target enhancement and original detail maintenance, leading to better editing quality. Extensive experiments demonstrate the effectiveness of our method on 3D non-rigid editing tasks

Plasticine3D: 3D Non-Rigid Editing with Text Guidance by Multi-View Embedding Optimization

TL;DR

Plasticine3D tackles the challenge of text-guided 3D non-rigid editing with large structure changes by decoupling geometry and texture editing and introducing three key innovations: Multi-View-Embedding (MVE) Optimization to capture cross-view object features, Embedding-Fusion (EF) for controllable editing strength, and Score Projection Sampling (SPS) to stabilize large deformations while preserving detail. A supplementary multi-view normal-depth diffusion guidance further enforces geometric consistency. In experiments against Vox-E and DreamBooth3D baselines, Plasticine3D achieves superior editing accuracy and deformation capability, validated by higher CLIP_sim and CLIP_dir scores and qualitative results. The work offers a practical, fine-grained framework for editing 3D assets with text prompts, enabling more flexible asset customization in applications like games and virtual environments.

Abstract

With the help of Score Distillation Sampling (SDS) and the rapid development of neural 3D representations, some methods have been proposed to perform 3D editing such as adding additional geometries, or overwriting textures. However, generalized 3D non-rigid editing task, which requires changing both the structure (posture or composition) and appearance (texture) of the original object, remains to be challenging in 3D editing field. In this paper, we propose Plasticine3D, a novel text-guided fine-grained controlled 3D editing pipeline that can perform 3D non-rigid editing with large structure deformations. Our work divides the editing process into a geometry editing stage and a texture editing stage to achieve separate control of structure and appearance. In order to maintain the details of the original object from different viewpoints, we propose a Multi-View-Embedding (MVE) Optimization strategy to ensure that the guidance model learns the features of the original object from various viewpoints. For the purpose of fine-grained control, we propose Embedding-Fusion (EF) to blend the original characteristics with the editing objectives in the embedding space, and control the extent of editing by adjusting the fusion rate. Furthermore, in order to address the issue of gradual loss of details during the generation process under high editing intensity, as well as the problem of insignificant editing effects in some scenarios, we propose Score Projection Sampling (SPS) as a replacement of score distillation sampling, which introduces additional optimization phases for editing target enhancement and original detail maintenance, leading to better editing quality. Extensive experiments demonstrate the effectiveness of our method on 3D non-rigid editing tasks
Paper Structure (24 sections, 12 equations, 7 figures, 2 tables)

This paper contains 24 sections, 12 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: OurPlasticine3Dachieves text-guided fine-grained controlled 3D editing and enables non-rigid editing with large deformations. For each case, the original object's RGB and normal maps are displayed on the left side, with the target prompt below, highlighting the main editing objective in orange. The final edited result's RGB and normal maps are on the right.
  • Figure 2: Framework of Plasticine3D. We first optimize a set of trainable Multi-View Embeddings (MVE) using the rendered images of the original object from multiple views to capture the features of the object. We also finetune a Stable Diffusion model based on the renderings of the original object. Then we fuse the optimized MVE with the target text embedding to get fused embeddings as the semantic guidance in our editing. Finally, we edit the object with our novel three-phase Score Projection Sampling (SPS), which enhances both the editing target features and source details.
  • Figure 3: Qualitative comparisons: Vox-E (global) and Vox-E (local) lack the ability in handling 3D non-rigid editing involving large structure deformations. DreamBooth3D has some ability in performing large structure deformations, but lacks stability and suffers from low quality. In contrast, our Plasticine3D achieves good performance in editing accuracy and large deformations.
  • Figure 4: Ablation experiment for components in MVE optimization. Without multi-view embeddings, the editing results suffer from serious geometric inconsistency (three legs). Without UNet finetuning, the editing results are distorted and lose original details.
  • Figure 5: Ablation studies on Score Projection Sampling (SPS). Without target enhancement phase in SPS, the editing results show poor alignment with the editing target. Without detail enhancement phase is SPS, the editing results lose the details of the original object.
  • ...and 2 more figures