3D Gaussian Editing with A Single Image

Guan Luo; Tian-Xing Xu; Ying-Tian Liu; Xiao-Xiong Fan; Fang-Lue Zhang; Song-Hai Zhang

3D Gaussian Editing with A Single Image

Guan Luo, Tian-Xing Xu, Ying-Tian Liu, Xiao-Xiong Fan, Fang-Lue Zhang, Song-Hai Zhang

TL;DR

This work presents the first single-image-driven 3D scene editing method based on 3D Gaussian Splatting, enabling editing of a scene from a user-specified viewpoint to match an edited 2D image. It introduces a positional loss $\mathcal{L}_u$ derived from optimal transport to capture long-range deformation and propagates gradients through a reparameterization; an anchor-based ARAP regularization with a coarse-to-fine strategy stabilizes occluded regions and long-range edits; and an adaptive rigidity masking mechanism to handle non-rigid regions with explicit rotation and distance supervision. The approach achieves superior geometry and texture editing, improves alignment to reference edits, and supports single-view video tracking with temporal consistency. The method advances intuitive 3D content generation by enabling detailed, controllable edits directly driven by 2D images, with practical implications for film, gaming, and AR/VR content creation.

Abstract

The modeling and manipulation of 3D scenes captured from the real world are pivotal in various applications, attracting growing research interest. While previous works on editing have achieved interesting results through manipulating 3D meshes, they often require accurately reconstructed meshes to perform editing, which limits their application in 3D content generation. To address this gap, we introduce a novel single-image-driven 3D scene editing approach based on 3D Gaussian Splatting, enabling intuitive manipulation via directly editing the content on a 2D image plane. Our method learns to optimize the 3D Gaussians to align with an edited version of the image rendered from a user-specified viewpoint of the original scene. To capture long-range object deformation, we introduce positional loss into the optimization process of 3D Gaussian Splatting and enable gradient propagation through reparameterization. To handle occluded 3D Gaussians when rendering from the specified viewpoint, we build an anchor-based structure and employ a coarse-to-fine optimization strategy capable of handling long-range deformation while maintaining structural stability. Furthermore, we design a novel masking strategy to adaptively identify non-rigid deformation regions for fine-scale modeling. Extensive experiments show the effectiveness of our method in handling geometric details, long-range, and non-rigid deformation, demonstrating superior editing flexibility and quality compared to previous approaches.

3D Gaussian Editing with A Single Image

TL;DR

derived from optimal transport to capture long-range deformation and propagates gradients through a reparameterization; an anchor-based ARAP regularization with a coarse-to-fine strategy stabilizes occluded regions and long-range edits; and an adaptive rigidity masking mechanism to handle non-rigid regions with explicit rotation and distance supervision. The approach achieves superior geometry and texture editing, improves alignment to reference edits, and supports single-view video tracking with temporal consistency. The method advances intuitive 3D content generation by enabling detailed, controllable edits directly driven by 2D images, with practical implications for film, gaming, and AR/VR content creation.

Abstract

Paper Structure (17 sections, 18 equations, 11 figures, 2 tables)

This paper contains 17 sections, 18 equations, 11 figures, 2 tables.

Introduction
Related Work
Differentiable Rendering
NeRF and 3D Gaussian Editing
Preliminaries
Method
Positional Derivative
Anchor-Based Deformation
Adaptive Rigidity Masking
Loss Function
Experiment
Long-range Deformation
Geometry Editing
Hybrid Editing
Single-View Video Tracking
...and 2 more sections

Figures (11)

Figure 1: An overview of our method. We address the single-image-driven editing task by an iterative gradient descent process that optimizes the 3D Gaussians to align with the reference image. To model long-range object deformation, we introduce the positional loss. To preserve the geometric consistency of the objects, we propose an anchor-based as-rigid-as-possible regularization scheme, a coarse-to-fine optimization strategy, and an adaptive masking strategy to identify the non-rigid deformation parts.
Figure 2: Visualization of the gradients with respect to the centers of Gaussians. The position loss provides consistent and dense gradients to move down the bulldozer's shovel.
Figure 3: Adaptive rigidity masks. "Distance Mask" and "ARAP Mask" denote the learnable masks of the relative distance regularization term and ARAP regularization term, respectively.
Figure 4: Illustration of the optimization process for long-range rigid transformation.
Figure 5: Geometric editing under different scales.
...and 6 more figures

3D Gaussian Editing with A Single Image

TL;DR

Abstract

3D Gaussian Editing with A Single Image

Authors

TL;DR

Abstract

Table of Contents

Figures (11)