Table of Contents
Fetching ...

Only a Matter of Style: Age Transformation Using a Style-Based Regression Model

Yuval Alaluf, Or Patashnik, Daniel Cohen-Or

TL;DR

This work introduces Style-based Age Manipulation (SAM), a data-efficient, end-to-end image-to-image framework that ages a single input face by encoding real images into StyleGAN's latent space under a target age shift. By leveraging a fixed StyleGAN2 generator, a learned aging encoder, and a pre-trained age regressor, SAM learns a non-linear latent path that disentangles aging from other attributes and enables fine-grained control, cycle-consistency training, and additional editing such as patch edits and style mixing. The approach outperforms state-of-the-art lifelong-age methods (LIFE, HRFAE) in both qualitative and quantitative assessments and surpasses latent-space baselines (InterFaceGAN, StyleFlow) in realism and identity preservation on real images. The work also analyzes the learned latent paths, showing non-linearity and better manifold alignment, and discusses limitations related to extreme poses, hair changes, and age-prediction biases, laying groundwork for future extensions to broader editing tasks.

Abstract

The task of age transformation illustrates the change of an individual's appearance over time. Accurately modeling this complex transformation over an input facial image is extremely challenging as it requires making convincing, possibly large changes to facial features and head shape, while still preserving the input identity. In this work, we present an image-to-image translation method that learns to directly encode real facial images into the latent space of a pre-trained unconditional GAN (e.g., StyleGAN) subject to a given aging shift. We employ a pre-trained age regression network to explicitly guide the encoder in generating the latent codes corresponding to the desired age. In this formulation, our method approaches the continuous aging process as a regression task between the input age and desired target age, providing fine-grained control over the generated image. Moreover, unlike approaches that operate solely in the latent space using a prior on the path controlling age, our method learns a more disentangled, non-linear path. Finally, we demonstrate that the end-to-end nature of our approach, coupled with the rich semantic latent space of StyleGAN, allows for further editing of the generated images. Qualitative and quantitative evaluations show the advantages of our method compared to state-of-the-art approaches.

Only a Matter of Style: Age Transformation Using a Style-Based Regression Model

TL;DR

This work introduces Style-based Age Manipulation (SAM), a data-efficient, end-to-end image-to-image framework that ages a single input face by encoding real images into StyleGAN's latent space under a target age shift. By leveraging a fixed StyleGAN2 generator, a learned aging encoder, and a pre-trained age regressor, SAM learns a non-linear latent path that disentangles aging from other attributes and enables fine-grained control, cycle-consistency training, and additional editing such as patch edits and style mixing. The approach outperforms state-of-the-art lifelong-age methods (LIFE, HRFAE) in both qualitative and quantitative assessments and surpasses latent-space baselines (InterFaceGAN, StyleFlow) in realism and identity preservation on real images. The work also analyzes the learned latent paths, showing non-linearity and better manifold alignment, and discusses limitations related to extreme poses, hair changes, and age-prediction biases, laying groundwork for future extensions to broader editing tasks.

Abstract

The task of age transformation illustrates the change of an individual's appearance over time. Accurately modeling this complex transformation over an input facial image is extremely challenging as it requires making convincing, possibly large changes to facial features and head shape, while still preserving the input identity. In this work, we present an image-to-image translation method that learns to directly encode real facial images into the latent space of a pre-trained unconditional GAN (e.g., StyleGAN) subject to a given aging shift. We employ a pre-trained age regression network to explicitly guide the encoder in generating the latent codes corresponding to the desired age. In this formulation, our method approaches the continuous aging process as a regression task between the input age and desired target age, providing fine-grained control over the generated image. Moreover, unlike approaches that operate solely in the latent space using a prior on the path controlling age, our method learns a more disentangled, non-linear path. Finally, we demonstrate that the end-to-end nature of our approach, coupled with the rich semantic latent space of StyleGAN, allows for further editing of the generated images. Qualitative and quantitative evaluations show the advantages of our method compared to state-of-the-art approaches.

Paper Structure

This paper contains 35 sections, 16 equations, 22 figures, 3 tables.

Figures (22)

  • Figure 1: Our SAM architecture. The network receives an input face image and a desired target age $\alpha_t$. First, the aging encoder $E_{age}$ is tasked with extracting feature maps at $3$ different spatial scales. Then, $18$map2style blocks, introduced in richardson2020encoding, are used to gradually down-sample the $3$ feature maps into $18$ different $512$-dimensional style vectors, thereby encoding the input image into the $\mathcal{W}+$ StyleGAN latent space. We additionally employ a fixed, pre-trained pSp richardson2020encoding encoder to extract the $\mathcal{W}+$ latent code of $\textbf{x}$, denoted $\textbf{w}^*$, which is then added to the age-transformed latent code, denoted $E_{age}(\textbf{x}_{age})$. A pre-trained StyleGAN is then used to generate the desired age-transformed image using the aggregated latent code. During training $\mathcal{L}_{2}, \mathcal{L}_{LPIPS}$ and $\mathcal{L}_{ID}$ ensure visual similarity and identity preservation while $\mathcal{L}_{reg}$ encourages the learned latent codes to be closer to the average latent code. Finally, $\mathcal{L}_{age}$ guides the encoder in generating the desired age-transformed latent code. Observe that during training, only the aging encoder and map2style blocks are trained. Moreover, $\mathcal{L}_{LPIPS}$, $\mathcal{L}_{ID}$, and $\mathcal{L}_{age}$ are computed via fixed, pre-trained networks as described in Section \ref{['losses']}.
  • Figure 2: To address the challenge of the unsupervised setting, a cycle consistency pass is performed to recover the input at the source age $\alpha_s$.
  • Figure 3: Aging results generated using SAM. Observe the state-of-the-art image quality achieved by leveraging a fixed, pretrained StyleGAN generator.
  • Figure 4: Identity similarity results using the ArcFace deng2019arcface recognition network. For each row, we compute the cosine similarity between the query and the remaining images. Image credits in order: deniro_1,deniro_3,deniro_2008,deniro_4,deniro_5,hanks_1,hanks_2,hanks_3,hanks_4,hanks_5
  • Figure 5: Qualitative comparison of age transformation results with (a) LIFE orel2020lifespan and (b) HRFAE yao2020high on the CelebA-HQ karras2017progressive test set. For translating our images to the age groups in LIFE, we set the target age equal to the median age of each group. Best viewed zoomed-in. Additional results can be found in Appendix \ref{['additional_results']}.
  • ...and 17 more figures