Table of Contents
Fetching ...

AnchorFlow: Training-Free 3D Editing via Latent Anchor-Aligned Flows

Zhenglin Zhou, Fan Ma, Chengzhuo Gui, Xiaobo Xia, Hehe Fan, Yi Yang, Tat-Seng Chua

TL;DR

AnchorFlow tackles training-free, mask-free 3D editing by stabilizing latent references through a global latent anchor and an anchor-aligned update rule. The method yields strong semantic edits with preserved geometry, validated on a new Eval3DEdit benchmark across multiple editing types. It avoids mask supervision and enables scalable data curation for instruction-following 3D editing. Quantitative and qualitative results show competitive or superior performance compared to state-of-the-art inversion-free and LFM-based methods.

Abstract

Training-free 3D editing aims to modify 3D shapes based on human instructions without model finetuning. It plays a crucial role in 3D content creation. However, existing approaches often struggle to produce strong or geometrically stable edits, largely due to inconsistent latent anchors introduced by timestep-dependent noise during diffusion sampling. To address these limitations, we introduce AnchorFlow, which is built upon the principle of latent anchor consistency. Specifically, AnchorFlow establishes a global latent anchor shared between the source and target trajectories, and enforces coherence using a relaxed anchor-alignment loss together with an anchor-aligned update rule. This design ensures that transformations remain stable and semantically faithful throughout the editing process. By stabilizing the latent reference space, AnchorFlow enables more pronounced semantic modifications. Moreover, AnchorFlow is mask-free. Without mask supervision, it effectively preserves geometric fidelity. Experiments on the Eval3DEdit benchmark show that AnchorFlow consistently delivers semantically aligned and structurally robust edits across diverse editing types. Code is at https://github.com/ZhenglinZhou/AnchorFlow.

AnchorFlow: Training-Free 3D Editing via Latent Anchor-Aligned Flows

TL;DR

AnchorFlow tackles training-free, mask-free 3D editing by stabilizing latent references through a global latent anchor and an anchor-aligned update rule. The method yields strong semantic edits with preserved geometry, validated on a new Eval3DEdit benchmark across multiple editing types. It avoids mask supervision and enables scalable data curation for instruction-following 3D editing. Quantitative and qualitative results show competitive or superior performance compared to state-of-the-art inversion-free and LFM-based methods.

Abstract

Training-free 3D editing aims to modify 3D shapes based on human instructions without model finetuning. It plays a crucial role in 3D content creation. However, existing approaches often struggle to produce strong or geometrically stable edits, largely due to inconsistent latent anchors introduced by timestep-dependent noise during diffusion sampling. To address these limitations, we introduce AnchorFlow, which is built upon the principle of latent anchor consistency. Specifically, AnchorFlow establishes a global latent anchor shared between the source and target trajectories, and enforces coherence using a relaxed anchor-alignment loss together with an anchor-aligned update rule. This design ensures that transformations remain stable and semantically faithful throughout the editing process. By stabilizing the latent reference space, AnchorFlow enables more pronounced semantic modifications. Moreover, AnchorFlow is mask-free. Without mask supervision, it effectively preserves geometric fidelity. Experiments on the Eval3DEdit benchmark show that AnchorFlow consistently delivers semantically aligned and structurally robust edits across diverse editing types. Code is at https://github.com/ZhenglinZhou/AnchorFlow.

Paper Structure

This paper contains 27 sections, 20 equations, 10 figures, 3 tables, 1 algorithm.

Figures (10)

  • Figure 1: Overview of 3D Editing Results from AnchorFlow Across Diverse Editing Tasks. We present four major types of edits supported by AnchorFlow: (1) Action Change: altering the pose or articulation of the 3D shape; (2) Object Addition: introducing new geometric elements; (3) Object Replacement: substituting existing components with new ones; (4) Style Change: modifying the shape style while preserving the overall.
  • Figure 2: Effect of Latent Anchor Selection on 3D Editing. (a) Random timestep-wise anchors cause inconsistent flows, resulting in under-editing and geometric breakage. (b) Fixed anchors over-constrain trajectories and push the model away from the source manifold, causing over-editing. (c) Our aligned anchors maintain consistent latent references, enabling balanced editing.
  • Figure 3: Overview of the AnchorFlow for Training-free and Mask-free 3D Editing. Given a source model and an editing instruction, AnchorFlow first constructs the source sample $\bm{X}^{\mathrm{src}}_t$ and forms the editing sample $\bm{X}^{\mathrm{FE}}_t$ at the $t$ step. A 3D flow-based model $\bm{v}_\theta$ predicts velocity fields for both the source and target sample. To stabilize the editing process, AnchorFlow performs a single-step inversion to approximate the latent anchors $F_t(\bm{X}^{\mathrm{src}}_t)$ and $F_t(\bm{X}^{\mathrm{tar}}_t)$, and aligns them in noise space via the anchor-aligned update guided by $\nabla \mathcal{L}_\mathrm{align}$. This design enforces consistent latent anchors, mitigates geometric distortions, and produces structurally stable 3D edits.
  • Figure 4: Qualitative Comparisons. Each column shows condition pairs, source model, and the corresponding results from various baselines and our method. Compared with previous approaches, especially Inversion-Free Editing kulikov2024flowedit, our method produces edits that are both semantically faithful and geometrically consistent, effectively mitigating cases of insufficient edits and distorted geometry.
  • Figure 5: The Effect of Averaging Directions. Averaging $n_\mathrm{avg}$ noisy flow directions stabilizes updates. Compared with the computational cost introduced by averaging, AnchorFlow achieves better results with almost no extra time overhead.
  • ...and 5 more figures