Real-time 3D-aware Portrait Editing from a Single Image
Qingyan Bai, Zifan Shi, Yinghao Xu, Hao Ouyang, Qiuyu Wang, Ceyuan Yang, Xuan Wang, Gordon Wetzstein, Yujun Shen, Qifeng Chen
TL;DR
3DPE addresses the challenge of real-time, 3D-consistent portrait editing from a single image by distilling editing priors from a diffusion-based image editor and a 3D GAN into a lightweight module built on Live3D/EG3D. Through a cross-attention-based prompt conditioning and a dual-branch feature strategy, it preserves geometry while enabling appearance edits from both image and text prompts, guided by 2D pseudo-labels and 3D supervision. The approach delivers real-time performance (~40 ms per image), fast customization (~5 minutes) for user-defined prompts, and novel-view consistency that outperforms 2D-first and heavy optimization baselines, with practical applications in AR/VR, teleconferencing, and video editing. It also provides an interactive editing system and efficient adaptation pipelines, though it admits some limitations in fine-grained novel-view details and occasional video flicker.
Abstract
This work presents 3DPE, a practical method that can efficiently edit a face image following given prompts, like reference images or text descriptions, in a 3D-aware manner. To this end, a lightweight module is distilled from a 3D portrait generator and a text-to-image model, which provide prior knowledge of face geometry and superior editing capability, respectively. Such a design brings two compelling advantages over existing approaches. First, our method achieves real-time editing with a feedforward network (i.e., ~0.04s per image), over 100x faster than the second competitor. Second, thanks to the powerful priors, our module could focus on the learning of editing-related variations, such that it manages to handle various types of editing simultaneously in the training phase and further supports fast adaptation to user-specified customized types of editing during inference (e.g., with ~5min fine-tuning per style).
