Fast Multi-view Consistent 3D Editing with Video Priors

Liyi Chen; Ruihuang Li; Guowen Zhang; Pengfei Wang; Lei Zhang

Fast Multi-view Consistent 3D Editing with Video Priors

Liyi Chen, Ruihuang Li, Guowen Zhang, Pengfei Wang, Lei Zhang

TL;DR

ViP3DE introduces video priors into text-driven 3D editing to enforce multi-view consistency in a single forward pass. By applying motion-preserved noise blending and geometry-aware denoising during diffusion, it generates edited views conditioned on a single frame and updates the 3D Gaussian representation accordingly. The approach yields higher editing fidelity and faster convergence than prior 3D- or video-based methods, while maintaining pose consistency across views. Limitations remain for drastic geometric edits, pointing to future work in expanding editing capabilities and geometry-aware priors.

Abstract

Text-driven 3D editing enables user-friendly 3D object or scene editing with text instructions. Due to the lack of multi-view consistency priors, existing methods typically resort to employing 2D generation or editing models to process each view individually, followed by iterative 2D-3D-2D updating. However, these methods are not only time-consuming but also prone to over-smoothed results because the different editing signals gathered from different views are averaged during the iterative process. In this paper, we propose generative Video Prior based 3D Editing (ViP3DE) to employ the temporal consistency priors from pre-trained video generation models for multi-view consistent 3D editing in a single forward pass. Our key insight is to condition the video generation model on a single edited view to generate other consistent edited views for 3D updating directly, thereby bypassing the iterative editing paradigm. Since 3D updating requires edited views to be paired with specific camera poses, we propose motion-preserved noise blending for the video model to generate edited views at predefined camera poses. In addition, we introduce geometry-aware denoising to further enhance multi-view consistency by integrating 3D geometric priors into video models. Extensive experiments demonstrate that our proposed ViP3DE can achieve high-quality 3D editing results even within a single forward pass, significantly outperforming existing methods in both editing quality and speed.

Fast Multi-view Consistent 3D Editing with Video Priors

TL;DR

Abstract

Fast Multi-view Consistent 3D Editing with Video Priors

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (16)