Table of Contents
Fetching ...

Easy3E: Feed-Forward 3D Asset Editing via Rectified Voxel Flow

Shimin Hu, Yuanyi Wei, Fei Zha, Yudong Guo, Juyong Zhang

TL;DR

Voxel FlowEdit is introduced, an edit-driven flow in the sparse voxel latent space that achieves globally consistent 3D deformation in a single pass that enables fast, globally consistent, and high-fidelity 3D model editing.

Abstract

Existing 3D editing methods rely on computationally intensive scene-by-scene iterative optimization and suffer from multi-view inconsistency. We propose an effective and fully feedforward 3D editing framework based on the TRELLIS generative backbone, capable of modifying 3D models from a single editing view. Our framework addresses two key issues: adapting training-free 2D editing to structured 3D representations, and overcoming the bottleneck of appearance fidelity in compressed 3D features. To ensure geometric consistency, we introduce Voxel FlowEdit, an edit-driven flow in the sparse voxel latent space that achieves globally consistent 3D deformation in a single pass. To restore high-fidelity details, we develop a normal-guided single to multi-view generation module as an external appearance prior, successfully recovering high-frequency textures. Experiments demonstrate that our method enables fast, globally consistent, and high-fidelity 3D model editing.

Easy3E: Feed-Forward 3D Asset Editing via Rectified Voxel Flow

TL;DR

Voxel FlowEdit is introduced, an edit-driven flow in the sparse voxel latent space that achieves globally consistent 3D deformation in a single pass that enables fast, globally consistent, and high-fidelity 3D model editing.

Abstract

Existing 3D editing methods rely on computationally intensive scene-by-scene iterative optimization and suffer from multi-view inconsistency. We propose an effective and fully feedforward 3D editing framework based on the TRELLIS generative backbone, capable of modifying 3D models from a single editing view. Our framework addresses two key issues: adapting training-free 2D editing to structured 3D representations, and overcoming the bottleneck of appearance fidelity in compressed 3D features. To ensure geometric consistency, we introduce Voxel FlowEdit, an edit-driven flow in the sparse voxel latent space that achieves globally consistent 3D deformation in a single pass. To restore high-fidelity details, we develop a normal-guided single to multi-view generation module as an external appearance prior, successfully recovering high-frequency textures. Experiments demonstrate that our method enables fast, globally consistent, and high-fidelity 3D model editing.
Paper Structure (19 sections, 15 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 19 sections, 15 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview of Easy3E. The framework operates in two main stages: Geometry Editing and Texture Refinement. Starting from a rendered source view, an edited target image provides the guidance for editing. In the Geometry Editing stage, the Voxel FlowEdit algorithm transforms the source voxel structure under flow-based guidance, followed by SLAT Repainting that refines local latent features to produce the target mesh. The Texture Refinement stage then employs a generation branch and a normal-guided control adapter to synthesize multi-view-consistent textures, which are projected and fused onto the mesh to yield the final high-fidelity 3D asset.
  • Figure 2: Comparison of FlowEdit's limitations and the Voxel FlowEdit solution. (a) Base FlowEdit: The semantic velocity $\mathbf{v}_{\text{edit}}$ is corrupted by accumulated approximation error, causing the trajectory to drift and resulting in structural corruption (red dashed box). (b) Voxel FlowEdit: The edit is driven by external gradient guidance $\mathbf{G}_{\text{sil}}$, while internal correction $\boldsymbol{\xi}_{\text{traj}}$ maintains manifold consistency. This combined approach achieves a clean and structurally integral edit.
  • Figure 3: Qualitative comparison. Our method achieves clean geometry and consistent appearance across multiple views, faithfully realizing the target edits while preserving unedited regions. Competing methods either retain the original geometry (MVEdit) or exhibit strong structural distortion and inconsistency (Vox-E, Instant3dit).
  • Figure 4: Comparison between without and with Flow Guidance. Disabling $\mathbf{G}_{\text{sil}}$ and $\boldsymbol{\xi}_{\text{traj}}$ jointly leads to structural collapse and view-inconsistent deformation, whereas enabling both yields stable and silhouette-aligned geometry.
  • Figure 5: Comparison between without and with Texture Refinement. The refinement stage significantly enhances surface detail and view-consistent appearance.
  • ...and 2 more figures