Control4D: Efficient 4D Portrait Editing with Text
Ruizhi Shao, Jingxiang Sun, Cheng Peng, Zerong Zheng, Boyao Zhou, Hongwen Zhang, Yebin Liu
TL;DR
Control4D addresses the challenge of efficiently and consistently editing dynamic 4D portraits using text. It introduces GaussianPlanes, a plane-based decomposition of 4D Gaussian Splatting that accelerates and stabilizes the representation, together with a 4D generator that learns from the diffusion-based editor to produce coherent, high-quality edits. The framework combines a GAN-based generator and a diffusion-based editing loop, enabling fast training and robust spatiotemporal consistency across views and time. Experimental results show faster convergence, improved rendering quality, and stronger temporal coherence compared with prior 4D editing approaches, highlighting its practical impact for text-driven 4D portrait manipulation.
Abstract
We introduce Control4D, an innovative framework for editing dynamic 4D portraits using text instructions. Our method addresses the prevalent challenges in 4D editing, notably the inefficiencies of existing 4D representations and the inconsistent editing effect caused by diffusion-based editors. We first propose GaussianPlanes, a novel 4D representation that makes Gaussian Splatting more structured by applying plane-based decomposition in 3D space and time. This enhances both efficiency and robustness in 4D editing. Furthermore, we propose to leverage a 4D generator to learn a more continuous generation space from inconsistent edited images produced by the diffusion-based editor, which effectively improves the consistency and quality of 4D editing. Comprehensive evaluation demonstrates the superiority of Control4D, including significantly reduced training time, high-quality rendering, and spatial-temporal consistency in 4D portrait editing. The link to our project website is https://control4darxiv.github.io.
