Table of Contents
Fetching ...

DiffAge3D: Diffusion-based 3D-aware Face Aging

Junaid Wahid, Fangneng Zhan, Pramod Rao, Christian Theobalt

TL;DR

DiffAge3D tackles the lack of 3D-aware face aging by decoupling aging from camera viewpoint and leveraging a robust, inversion-free 3D data-generation pipeline based on EG3D and CLIP. It then introduces a diffusion-based aging model with an Aging Network, a viewpoint controller, and a Temporal Consistent Aging Module to achieve faithful aging while preserving identity and enabling multiview consistency. The approach demonstrates superior aging accuracy, identity preservation, and 3D consistency compared with 2D baselines, highlighting its potential for realistic 3D-aware face editing in entertainment and visualization. The work combines 3D GAN-based data synthesis with diffusion-based aging, offering a practical, scalable path to multiview aging without inversion bottlenecks and with controllable viewpoints.

Abstract

Face aging is the process of converting an individual's appearance to a younger or older version of themselves. Existing face aging techniques have been limited to 2D settings, which often weaken their applications as there is a growing demand for 3D face modeling. Moreover, existing aging methods struggle to perform faithful aging, maintain identity, and retain the fine details of the input images. Given these limitations and the need for a 3D-aware aging method, we propose DiffAge3D, the first 3D-aware aging framework that not only performs faithful aging and identity preservation but also operates in a 3D setting. Our aging framework allows to model the aging and camera pose separately by only taking a single image with a target age. Our framework includes a robust 3D-aware aging dataset generation pipeline by utilizing a pre-trained 3D GAN and the rich text embedding capabilities within CLIP model. Notably, we do not employ any inversion bottleneck in dataset generation. Instead, we randomly generate training samples from the latent space of 3D GAN, allowing us to manipulate the rich latent space of GAN to generate ages even with large gaps. With the generated dataset, we train a viewpoint-aware diffusion-based aging model to control the camera pose and facial age. Through quantitative and qualitative evaluations, we demonstrate that DiffAge3D outperforms existing methods, particularly in multiview-consistent aging and fine details preservation.

DiffAge3D: Diffusion-based 3D-aware Face Aging

TL;DR

DiffAge3D tackles the lack of 3D-aware face aging by decoupling aging from camera viewpoint and leveraging a robust, inversion-free 3D data-generation pipeline based on EG3D and CLIP. It then introduces a diffusion-based aging model with an Aging Network, a viewpoint controller, and a Temporal Consistent Aging Module to achieve faithful aging while preserving identity and enabling multiview consistency. The approach demonstrates superior aging accuracy, identity preservation, and 3D consistency compared with 2D baselines, highlighting its potential for realistic 3D-aware face editing in entertainment and visualization. The work combines 3D GAN-based data synthesis with diffusion-based aging, offering a practical, scalable path to multiview aging without inversion bottlenecks and with controllable viewpoints.

Abstract

Face aging is the process of converting an individual's appearance to a younger or older version of themselves. Existing face aging techniques have been limited to 2D settings, which often weaken their applications as there is a growing demand for 3D face modeling. Moreover, existing aging methods struggle to perform faithful aging, maintain identity, and retain the fine details of the input images. Given these limitations and the need for a 3D-aware aging method, we propose DiffAge3D, the first 3D-aware aging framework that not only performs faithful aging and identity preservation but also operates in a 3D setting. Our aging framework allows to model the aging and camera pose separately by only taking a single image with a target age. Our framework includes a robust 3D-aware aging dataset generation pipeline by utilizing a pre-trained 3D GAN and the rich text embedding capabilities within CLIP model. Notably, we do not employ any inversion bottleneck in dataset generation. Instead, we randomly generate training samples from the latent space of 3D GAN, allowing us to manipulate the rich latent space of GAN to generate ages even with large gaps. With the generated dataset, we train a viewpoint-aware diffusion-based aging model to control the camera pose and facial age. Through quantitative and qualitative evaluations, we demonstrate that DiffAge3D outperforms existing methods, particularly in multiview-consistent aging and fine details preservation.
Paper Structure (22 sections, 10 equations, 10 figures, 3 tables)

This paper contains 22 sections, 10 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Given an input image (left) and target age(written on top), our method can synthesize consistent multiview aging results. These results can be rendered at arbitrary camera angles.
  • Figure 2: We present DiffAge3D, a two-stage solution to solve the multiview aging problem. In the first stage, we train a data generation pipeline that trains a multiview aging generation pipeline without using any dataset. We sample latent vectors from StyleGAN2 and guide the aging process by utilizing a CLIP Radford2021learning model and pretrained age predictor. Lastly, we leverage a state-of-the-art 3D-aware Generative Model chan2021 to obtain a faithful multiview aging result. In the second stage, we proposed a 3D diffusion-based aging framework. Our model can generate multiview aging results by only taking an input image and target age. We divided our model into 3 parts: Aging Network, Pose Controller, and Temporal Consistent Aging Module. Our whole method trains on a multiview synthetic aging dataset generated by our dataset generation pipeline.
  • Figure 3: Qualitative comparison of our data generation pipeline and SAM+EG3D. We compare both methods at two target ages: young (10 years old) and old (70 years old). Best viewed zoomed in
  • Figure 4: Qualitative comparison of aging results from 0 to 70 years at 10-year intervals. The left column shows the aging results on the input view, while the right column depicts aging results from a novel view. Best viewed zoomed in
  • Figure 5: Qualitative comparison of aging results generated from a novel view on the CelebA-HQ dataset across three age categories: Toddler (2 years old), Adult (25 years old), and Old (70 years old). Best viewed zoomed in
  • ...and 5 more figures