DiffAge3D: Diffusion-based 3D-aware Face Aging
Junaid Wahid, Fangneng Zhan, Pramod Rao, Christian Theobalt
TL;DR
DiffAge3D tackles the lack of 3D-aware face aging by decoupling aging from camera viewpoint and leveraging a robust, inversion-free 3D data-generation pipeline based on EG3D and CLIP. It then introduces a diffusion-based aging model with an Aging Network, a viewpoint controller, and a Temporal Consistent Aging Module to achieve faithful aging while preserving identity and enabling multiview consistency. The approach demonstrates superior aging accuracy, identity preservation, and 3D consistency compared with 2D baselines, highlighting its potential for realistic 3D-aware face editing in entertainment and visualization. The work combines 3D GAN-based data synthesis with diffusion-based aging, offering a practical, scalable path to multiview aging without inversion bottlenecks and with controllable viewpoints.
Abstract
Face aging is the process of converting an individual's appearance to a younger or older version of themselves. Existing face aging techniques have been limited to 2D settings, which often weaken their applications as there is a growing demand for 3D face modeling. Moreover, existing aging methods struggle to perform faithful aging, maintain identity, and retain the fine details of the input images. Given these limitations and the need for a 3D-aware aging method, we propose DiffAge3D, the first 3D-aware aging framework that not only performs faithful aging and identity preservation but also operates in a 3D setting. Our aging framework allows to model the aging and camera pose separately by only taking a single image with a target age. Our framework includes a robust 3D-aware aging dataset generation pipeline by utilizing a pre-trained 3D GAN and the rich text embedding capabilities within CLIP model. Notably, we do not employ any inversion bottleneck in dataset generation. Instead, we randomly generate training samples from the latent space of 3D GAN, allowing us to manipulate the rich latent space of GAN to generate ages even with large gaps. With the generated dataset, we train a viewpoint-aware diffusion-based aging model to control the camera pose and facial age. Through quantitative and qualitative evaluations, we demonstrate that DiffAge3D outperforms existing methods, particularly in multiview-consistent aging and fine details preservation.
