Plasticine3D: 3D Non-Rigid Editing with Text Guidance by Multi-View Embedding Optimization
Yige Chen, Teng Hu, Yizhe Tang, Siyuan Chen, Ang Chen, Ran Yi
TL;DR
Plasticine3D tackles the challenge of text-guided 3D non-rigid editing with large structure changes by decoupling geometry and texture editing and introducing three key innovations: Multi-View-Embedding (MVE) Optimization to capture cross-view object features, Embedding-Fusion (EF) for controllable editing strength, and Score Projection Sampling (SPS) to stabilize large deformations while preserving detail. A supplementary multi-view normal-depth diffusion guidance further enforces geometric consistency. In experiments against Vox-E and DreamBooth3D baselines, Plasticine3D achieves superior editing accuracy and deformation capability, validated by higher CLIP_sim and CLIP_dir scores and qualitative results. The work offers a practical, fine-grained framework for editing 3D assets with text prompts, enabling more flexible asset customization in applications like games and virtual environments.
Abstract
With the help of Score Distillation Sampling (SDS) and the rapid development of neural 3D representations, some methods have been proposed to perform 3D editing such as adding additional geometries, or overwriting textures. However, generalized 3D non-rigid editing task, which requires changing both the structure (posture or composition) and appearance (texture) of the original object, remains to be challenging in 3D editing field. In this paper, we propose Plasticine3D, a novel text-guided fine-grained controlled 3D editing pipeline that can perform 3D non-rigid editing with large structure deformations. Our work divides the editing process into a geometry editing stage and a texture editing stage to achieve separate control of structure and appearance. In order to maintain the details of the original object from different viewpoints, we propose a Multi-View-Embedding (MVE) Optimization strategy to ensure that the guidance model learns the features of the original object from various viewpoints. For the purpose of fine-grained control, we propose Embedding-Fusion (EF) to blend the original characteristics with the editing objectives in the embedding space, and control the extent of editing by adjusting the fusion rate. Furthermore, in order to address the issue of gradual loss of details during the generation process under high editing intensity, as well as the problem of insignificant editing effects in some scenarios, we propose Score Projection Sampling (SPS) as a replacement of score distillation sampling, which introduces additional optimization phases for editing target enhancement and original detail maintenance, leading to better editing quality. Extensive experiments demonstrate the effectiveness of our method on 3D non-rigid editing tasks
