Table of Contents
Fetching ...

Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition

Yisheng He, Weihao Yuan, Siyu Zhu, Zilong Dong, Liefeng Bo, Qixing Huang

TL;DR

Freditor introduces a frequency-decomposed NeRF editing framework that edits low-frequency image components in feature space to achieve high-fidelity and transferable 3D scene edits. A dual-branch architecture preserves high-frequency content for detail while applying stylistic edits in a low-frequency space, supported by a training pipeline with stage-wise optimization and intensity control via interpolation. The method enables transfer of stylization learned from 2D images to new 3D scenes without retraining and supports controllable strength and local/object edits, yielding improved multi-view consistency and image sharpness over prior approaches. Extensive experiments on real and large-scale scenes show superior consistency, perceptual quality, and transferability compared with baselines like Instruct-NeRF2NeRF, validating the practical impact for scalable 3D scene editing with text or image guidance.

Abstract

This paper enables high-fidelity, transferable NeRF editing by frequency decomposition. Recent NeRF editing pipelines lift 2D stylization results to 3D scenes while suffering from blurry results, and fail to capture detailed structures caused by the inconsistency between 2D editings. Our critical insight is that low-frequency components of images are more multiview-consistent after editing compared with their high-frequency parts. Moreover, the appearance style is mainly exhibited on the low-frequency components, and the content details especially reside in high-frequency parts. This motivates us to perform editing on low-frequency components, which results in high-fidelity edited scenes. In addition, the editing is performed in the low-frequency feature space, enabling stable intensity control and novel scene transfer. Comprehensive experiments conducted on photorealistic datasets demonstrate the superior performance of high-fidelity and transferable NeRF editing. The project page is at \url{https://aigc3d.github.io/freditor}.

Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition

TL;DR

Freditor introduces a frequency-decomposed NeRF editing framework that edits low-frequency image components in feature space to achieve high-fidelity and transferable 3D scene edits. A dual-branch architecture preserves high-frequency content for detail while applying stylistic edits in a low-frequency space, supported by a training pipeline with stage-wise optimization and intensity control via interpolation. The method enables transfer of stylization learned from 2D images to new 3D scenes without retraining and supports controllable strength and local/object edits, yielding improved multi-view consistency and image sharpness over prior approaches. Extensive experiments on real and large-scale scenes show superior consistency, perceptual quality, and transferability compared with baselines like Instruct-NeRF2NeRF, validating the practical impact for scalable 3D scene editing with text or image guidance.

Abstract

This paper enables high-fidelity, transferable NeRF editing by frequency decomposition. Recent NeRF editing pipelines lift 2D stylization results to 3D scenes while suffering from blurry results, and fail to capture detailed structures caused by the inconsistency between 2D editings. Our critical insight is that low-frequency components of images are more multiview-consistent after editing compared with their high-frequency parts. Moreover, the appearance style is mainly exhibited on the low-frequency components, and the content details especially reside in high-frequency parts. This motivates us to perform editing on low-frequency components, which results in high-fidelity edited scenes. In addition, the editing is performed in the low-frequency feature space, enabling stable intensity control and novel scene transfer. Comprehensive experiments conducted on photorealistic datasets demonstrate the superior performance of high-fidelity and transferable NeRF editing. The project page is at \url{https://aigc3d.github.io/freditor}.
Paper Structure (38 sections, 4 theorems, 40 equations, 14 figures, 3 tables)

This paper contains 38 sections, 4 theorems, 40 equations, 14 figures, 3 tables.

Key Result

Theorem 1

(Informal) Consider an optical flow-based consistency score $c(I_1, I_2)$ between images $I_1$ and $I_2$, we have

Figures (14)

  • Figure 1: High-fidelity and transferable 3D scenes editing with text instructions. (a) High-fidelity scene editing compared to Instruct-NeRF2NeRF is displayed. Three color patches are zoomed in to highlight the details. (b) The editing trained in one scenario could be directly transferred to different novel scenarios without the need for retraining.
  • Figure 2: Visualization of the difference between two styles in different frequency images. Two styles of the same scene are displayed. The MSE error between the two styles is computed, and the RGB distribution curves are placed close to each image. We can see the discrepancy between different styles in high-frequency is minor while that in low-frequency is significant.
  • Figure 3: Visualization of inconsistency regions caused by 2D editing. Images from left to right represent the edited view 1, view 2, view 1 wrapped to view 2 by optical-flow algorithm, and the inconsistency map between the two views. We can see that the inconsistency in edited high-frequency details is larger than the low-frequency ones.
  • Figure 4: Overall framework. Our pipeline comprises two primary branches: the high-frequency branch, reconstructed from multiview images, which ensures view-consistent scene details, and the low-frequency branch, responsible for filtering low-frequency components from the full scene feature fields, performing style transfer, and decoding the original and edited low-frequency images. Finally, the high-frequency details are reintegrated into the edited low-frequency image, resulting in a high-fidelity edited scene.
  • Figure 5: Qualitative Results: Our method can perform high-fidelity edits on real scenes, including environmental changes like adjusting the season, weather, or time, and customized changes of an object or a person in a scene. Moreover, the editing trained on scene 1 could be directly transferred to a novel scene 2, and the editing trained on a collection of 2D images could be transferred to 3D scenes.
  • ...and 9 more figures

Theorems & Definitions (4)

  • Theorem 1
  • theorem thmcountertheorem
  • theorem thmcountertheorem
  • Lemma 1