FPGA: Flexible Portrait Generation Approach
Zhaoli Deng, Fanyi Wang, Junkang Zhang, Fan Chen, Meng Zhang, Wendong Zhang, Wen Liu, Zhenpeng Mi
TL;DR
FPGA tackles the challenge of multi‑ID, full‑body portrait generation with low‑resolution facial detail by combining a Multi‑Mode Fusion training strategy (MMF) and a DDIM Inversion based ID Restoration framework (DIIR). It introduces IDZoom, a million‑scale multi‑modal dataset, and a RepControlNet‑based acceleration to deliver fast, region‑specific identity control and post‑hoc face restoration on diffusion models. Through extensive comparative and ablation experiments, FPGA achieves superior objective and subjective performance and demonstrates robust multi‑ID placement, face restoration, and even face swapping with stylization, while delivering inference times around 2.5 s on a single L20 GPU. The architecture is designed to be plug‑and‑play and broadly compatible with existing diffusion‑based portrait methods, enabling practical deployment for high‑fidelity, controllable portrait synthesis.
Abstract
Portrait Fidelity Generation is a prominent research area in generative models.Current methods face challenges in generating full-body images with low-resolution faces, especially in multi-ID photo phenomenon.To tackle these issues, we propose a comprehensive system called FPGA and construct a million-level multi-modal dataset IDZoom for training.FPGA consists of Multi-Mode Fusion training strategy (MMF) and DDIM Inversion based ID Restoration inference framework (DIIR). The MMF aims to activate the specified ID in the specified facial region. The DIIR aims to address the issue of face artifacts while keeping the background.Furthermore, DIIR is plug-and-play and can be applied to any diffusion-based portrait generation method to enhance their performance. DIIR is also capable of performing face-swapping tasks and is applicable to stylized faces as well.To validate the effectiveness of FPGA, we conducted extensive comparative and ablation experiments. The experimental results demonstrate that FPGA has significant advantages in both subjective and objective metrics, and achieves controllable generation in multi-ID scenarios. In addition, we accelerate the inference speed to within 2.5 seconds on a single L20 graphics card mainly based on our well designed reparameterization method, RepControlNet.
