Table of Contents
Fetching ...

From Cradle to Cane: A Two-Pass Framework for High-Fidelity Lifespan Face Aging

Tao Liu, Dafeng Zhang, Gengchen Li, Shizhuo Liu, Yongqi Song, Senmao Li, Shiqi Yang, Boqian Li, Kai Wang, Yaxing Wang

TL;DR

Cradle2Cane tackles the Age-ID trade-off in lifespan face aging by decoupling age accuracy and identity preservation into a two-pass diffusion framework built on few-step SDXL-Turbo. The first pass AdaNI provides adaptive, text-guided aging, while the second pass IDEmb with SVR-ArcFace and Rotate-CLIP reinforces identity during denoising. End-to-end training jointly optimizes identity, age, and perceptual quality losses, achieving superior age accuracy and identity consistency on CelebA-HQ with strong generalization to in-the-wild images and fast inference. The work demonstrates state-of-the-art performance across Face++ and Qwen-VL metrics and suggests practical, robust applications in entertainment, healthcare, and privacy-aware aging analysis.

Abstract

Face aging has become a crucial task in computer vision, with applications ranging from entertainment to healthcare. However, existing methods struggle with achieving a realistic and seamless transformation across the entire lifespan, especially when handling large age gaps or extreme head poses. The core challenge lies in balancing age accuracy and identity preservation--what we refer to as the Age-ID trade-off. Most prior methods either prioritize age transformation at the expense of identity consistency or vice versa. In this work, we address this issue by proposing a two-pass face aging framework, named Cradle2Cane, based on few-step text-to-image (T2I) diffusion models. The first pass focuses on solving age accuracy by introducing an adaptive noise injection (AdaNI) mechanism. This mechanism is guided by including prompt descriptions of age and gender for the given person as the textual condition. Also, by adjusting the noise level, we can control the strength of aging while allowing more flexibility in transforming the face. However, identity preservation is weakly ensured here to facilitate stronger age transformations. In the second pass, we enhance identity preservation while maintaining age-specific features by conditioning the model on two identity-aware embeddings (IDEmb): SVR-ArcFace and Rotate-CLIP. This pass allows for denoising the transformed image from the first pass, ensuring stronger identity preservation without compromising the aging accuracy. Both passes are jointly trained in an end-to-end way. Extensive experiments on the CelebA-HQ test dataset, evaluated through Face++ and Qwen-VL protocols, show that our Cradle2Cane outperforms existing face aging methods in age accuracy and identity consistency. Code is available at https://github.com/byliutao/Cradle2Cane.

From Cradle to Cane: A Two-Pass Framework for High-Fidelity Lifespan Face Aging

TL;DR

Cradle2Cane tackles the Age-ID trade-off in lifespan face aging by decoupling age accuracy and identity preservation into a two-pass diffusion framework built on few-step SDXL-Turbo. The first pass AdaNI provides adaptive, text-guided aging, while the second pass IDEmb with SVR-ArcFace and Rotate-CLIP reinforces identity during denoising. End-to-end training jointly optimizes identity, age, and perceptual quality losses, achieving superior age accuracy and identity consistency on CelebA-HQ with strong generalization to in-the-wild images and fast inference. The work demonstrates state-of-the-art performance across Face++ and Qwen-VL metrics and suggests practical, robust applications in entertainment, healthcare, and privacy-aware aging analysis.

Abstract

Face aging has become a crucial task in computer vision, with applications ranging from entertainment to healthcare. However, existing methods struggle with achieving a realistic and seamless transformation across the entire lifespan, especially when handling large age gaps or extreme head poses. The core challenge lies in balancing age accuracy and identity preservation--what we refer to as the Age-ID trade-off. Most prior methods either prioritize age transformation at the expense of identity consistency or vice versa. In this work, we address this issue by proposing a two-pass face aging framework, named Cradle2Cane, based on few-step text-to-image (T2I) diffusion models. The first pass focuses on solving age accuracy by introducing an adaptive noise injection (AdaNI) mechanism. This mechanism is guided by including prompt descriptions of age and gender for the given person as the textual condition. Also, by adjusting the noise level, we can control the strength of aging while allowing more flexibility in transforming the face. However, identity preservation is weakly ensured here to facilitate stronger age transformations. In the second pass, we enhance identity preservation while maintaining age-specific features by conditioning the model on two identity-aware embeddings (IDEmb): SVR-ArcFace and Rotate-CLIP. This pass allows for denoising the transformed image from the first pass, ensuring stronger identity preservation without compromising the aging accuracy. Both passes are jointly trained in an end-to-end way. Extensive experiments on the CelebA-HQ test dataset, evaluated through Face++ and Qwen-VL protocols, show that our Cradle2Cane outperforms existing face aging methods in age accuracy and identity consistency. Code is available at https://github.com/byliutao/Cradle2Cane.

Paper Structure

This paper contains 34 sections, 16 equations, 12 figures, 6 tables, 1 algorithm.

Figures (12)

  • Figure 1: Age–ID trade-off curves across sixty age shift values. We compute the Age/ID cosine similarities over 100 human faces across 1-60 age shift values and the corresponding harmonic means. Existing approaches tend to favor either age accuracy or identity consistency, resulting in imbalanced performance across the entire lifespan ages. In contrast, our method Cradle2Cane achieves a better balance between the two objectives. More details and results are provided in Appendix \ref{['appendix:tradeoff_details']}.
  • Figure 2: (Left) We illustrate the effects of injecting three different levels of noise into the input image, as used in the 4-step SDXL-Turbo image-to-image pipeline. As visually evident, higher noise levels lead to more pronounced age transformations at the cost of reduced identity preservation. (Right) We present a statistical analysis on 100 human faces, that quantitatively demonstrates the Age-ID trade-off inherent in face aging tasks. Specifically, we evaluate three representative noise injection levels and measure their corresponding impacts on age accuracy and identity consistency.
  • Figure 3: Our method Cradle2Cane consists of two passes: the first pass employs adaptive noise injection (AdaNI) to enhance age accuracy, while the second pass incorporates identity-aware embeddings (IDEmb), including SVR-ArcFace and Rotate-CLIP embeddings, to improve identity consistency. During training the MLPs and UNet-LoRA modules, we jointly optimize identity loss between source and target face images, as well as age and quality losses over the target images.
  • Figure 4: Qualitative comparison with existing face aging methods across lifespan ages. Our method Cradle2Cane is even able to imitate the natural hair change while the previous methods cannot. For comparisons on in-the-wild images, please refer to Fig. \ref{['fig:baseline_in_the_wild']} in the Appendix.
  • Figure 5: (Left) While applying to in-the-wild real human faces, Cradle2Cane demonstrates better performance while the existing methods often fail. (Right) Our Cradle2Cane can also be applied to modify gender and emotion attributes while performing age transformation on human faces.
  • ...and 7 more figures