Table of Contents
Fetching ...

Catalyst4D: High-Fidelity 3D-to-4D Scene Editing via Dynamic Propagation

Shifeng Chen, Yihui Li, Jun Liao, Hongyu Yang, Di Huang

Abstract

Recent advances in 3D scene editing using NeRF and 3DGS enable high-quality static scene editing. In contrast, dynamic scene editing remains challenging, as methods that directly extend 2D diffusion models to 4D often produce motion artifacts, temporal flickering, and inconsistent style propagation. We introduce Catalyst4D, a framework that transfers high-quality 3D edits to dynamic 4D Gaussian scenes while maintaining spatial and temporal coherence. At its core, Anchor-based Motion Guidance (AMG) builds a set of structurally stable and spatially representative anchors from both original and edited Gaussians. These anchors serve as robust region-level references, and their correspondences are established via optimal transport to enable consistent deformation propagation without cross-region interference or motion drift. Complementarily, Color Uncertainty-guided Appearance Refinement (CUAR) preserves temporal appearance consistency by estimating per-Gaussian color uncertainty and selectively refining regions prone to occlusion-induced artifacts. Extensive experiments demonstrate that Catalyst4D achieves temporally stable, high-fidelity dynamic scene editing and outperforms existing methods in both visual quality and motion coherence.

Catalyst4D: High-Fidelity 3D-to-4D Scene Editing via Dynamic Propagation

Abstract

Recent advances in 3D scene editing using NeRF and 3DGS enable high-quality static scene editing. In contrast, dynamic scene editing remains challenging, as methods that directly extend 2D diffusion models to 4D often produce motion artifacts, temporal flickering, and inconsistent style propagation. We introduce Catalyst4D, a framework that transfers high-quality 3D edits to dynamic 4D Gaussian scenes while maintaining spatial and temporal coherence. At its core, Anchor-based Motion Guidance (AMG) builds a set of structurally stable and spatially representative anchors from both original and edited Gaussians. These anchors serve as robust region-level references, and their correspondences are established via optimal transport to enable consistent deformation propagation without cross-region interference or motion drift. Complementarily, Color Uncertainty-guided Appearance Refinement (CUAR) preserves temporal appearance consistency by estimating per-Gaussian color uncertainty and selectively refining regions prone to occlusion-induced artifacts. Extensive experiments demonstrate that Catalyst4D achieves temporally stable, high-fidelity dynamic scene editing and outperforms existing methods in both visual quality and motion coherence.
Paper Structure (26 sections, 24 equations, 12 figures, 4 tables)

This paper contains 26 sections, 24 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: We present Catalyst4D, a framework that propagates single-frame 3D edits to dynamic sequences. It excels at both precise local modifications and high-quality global style transfer. Catalyst4D demonstrates robust performance on both monocular (left) and multi-camera (right) scenes. Please refer to the supplementary material for more intuitive visual results.
  • Figure 2: Overview of Catalyst4D. Given the first-frame edited dynamic Gaussians, our (a) Anchor-based Motion Guidance establishes region-level correspondences with the original Gaussians via anchor construction and optimal transport, enabling reliable deformation transfer. Then, (b) Color Uncertainty-guided Appearance Refinement leverages first-frame warping and Gaussian color consistency to identify and correct motion-induced artifacts across time.
  • Figure 3: Qualitative editing results across multiple scenes: Cut-beef, Coffee-martini and sear-steak (DyNeRF), Discussion and Trimming (MeetRoom), 3Dprinter and Torchocolate (Hypernerf). Our method successfully edits dynamic scenes while adhering to user instructions.
  • Figure 4: Qualitative comparison with Instruct 4D-to-4D, Instruct-4DGS and CTRL-D. Red boxes indicate magnified regions. While competing methods often cause unintended modifications to non-target regions, Catalyst4D demonstrates precise, localized editing.
  • Figure 5: Qualitative comparison of localized editing. In contrast to CTRL-D, which introduces inconsistencies in non-edited regions, our method achieves more precise and localized editing by constraining dynamic Gaussians via 3D editing gradients.
  • ...and 7 more figures