Neural Video Fields Editing
Shuzhou Yang, Chong Mou, Jiwen Yu, Yuhan Wang, Xiandong Meng, Jian Zhang
TL;DR
NVEdit tackles memory and temporal inconsistency in long-video editing by learning a Neural Video Field (NVF) with tri-plane encoding and sparse grids to capture temporal priors, followed by editing via off-the-shelf diffusion-based T2I models guided by user prompts. The two-stage pipeline uses random-pixel video fitting to learn priors, then frame-wise editing with pseudo-ground-truths and a progressive optimization to preserve temporal coherence; an auxiliary IP2P+ mask further enhances local editing precision. Results show NVEdit can edit hundreds of frames with high inter-frame consistency and supports frame interpolation without fine-tuning, while remaining memory-efficient relative to frame-based diffusion methods. The framework is modular, allowing replacement or upgrading of both NVF components and the T2I models, enabling flexible adaptation to diverse editing tasks and future research directions.
Abstract
Diffusion models have revolutionized text-driven video editing. However, applying these methods to real-world editing encounters two significant challenges: (1) the rapid increase in GPU memory demand as the number of frames grows, and (2) the inter-frame inconsistency in edited videos. To this end, we propose NVEdit, a novel text-driven video editing framework designed to mitigate memory overhead and improve consistent editing for real-world long videos. Specifically, we construct a neural video field, powered by tri-plane and sparse grid, to enable encoding long videos with hundreds of frames in a memory-efficient manner. Next, we update the video field through off-the-shelf Text-to-Image (T2I) models to impart text-driven editing effects. A progressive optimization strategy is developed to preserve original temporal priors. Importantly, both the neural video field and T2I model are adaptable and replaceable, thus inspiring future research. Experiments demonstrate the ability of our approach to edit hundreds of frames with impressive inter-frame consistency. Our project is available at: https://nvedit.github.io/.
