Variation-aware Flexible 3D Gaussian Editing
Hao Qin, Yukai Sun, Meng Wang, Ming Kong, Mengxu Lu, Qiang Zhu
TL;DR
The paper tackles cross-view inconsistencies and limited flexibility in editing 3D Gaussians by proposing VF-Editor, a native 3D editing framework that predicts per-primitive variations via a variation predictor. It distills multi-source 2D editing priors into a unified model composed of a Random Tokenizer, a Variation Field Generation Module, and Iterative Parallel Decoding Functions, enabling real-time edits with $oldsymbol{\Delta} = \{\delta_{\mu}, \delta_{s}, \delta_{\alpha}, \delta_{c}, \delta_{r}\}$ and edited output $\mathcal{X}^{r} = \mathcal{X}^{s} + \Delta$. Key contributions include the variation field approach, linear-time parallel decoding, and multi-domain knowledge distillation using DDIM, diffusion inversion, and SDS-based strategies, validated on public/private data with improved Aesthetic/Consistency metrics and diverse editing capabilities. The method enables flexible, open-vocabulary 3D edits in real time, with strong generalization to unseen data and straightforward interpretability for adjustment and composition of edits across scenes and instructions.
Abstract
Indirect editing methods for 3D Gaussian Splatting (3DGS) have recently witnessed significant advancements. These approaches operate by first applying edits in the rendered 2D space and subsequently projecting the modifications back into 3D. However, this paradigm inevitably introduces cross-view inconsistencies and constrains both the flexibility and efficiency of the editing process. To address these challenges, we present VF-Editor, which enables native editing of Gaussian primitives by predicting attribute variations in a feedforward manner. To accurately and efficiently estimate these variations, we design a novel variation predictor distilled from 2D editing knowledge. The predictor encodes the input to generate a variation field and employs two learnable, parallel decoding functions to iteratively infer attribute changes for each 3D Gaussian. Thanks to its unified design, VF-Editor can seamlessly distill editing knowledge from diverse 2D editors and strategies into a single predictor, allowing for flexible and effective knowledge transfer into the 3D domain. Extensive experiments on both public and private datasets reveal the inherent limitations of indirect editing pipelines and validate the effectiveness and flexibility of our approach.
