Table of Contents
Fetching ...

Video Face Re-Aging: Toward Temporally Consistent Face Re-Aging

Abdul Muqeet, Kyuchul Lee, Bumsoo Kim, Yohan Hong, Hyungrae Lee, Woonggon Kim, KwangHee Lee

TL;DR

This work addresses the challenge of aging faces in videos with temporal consistency by creating a synthetic, paired video dataset and a baseline recurrent video framework. A generative model with a recurrent U‑Net and dual discriminators leverages input/output age masks to transform aging across frames, guided by novel temporal metrics TRWC and T-Age. Experiments on CelebV-HQ and VFHQ show improved age transformation accuracy and temporal coherence over state-of-the-art methods, and user studies corroborate the gains in temporal consistency. The data-centric approach and proposed metrics contribute practical tools for developing and evaluating temporally stable video re-aging systems, with implications for graphics, forensics, and media industries, while recognizing ethical considerations and biases inherent in synthetic data.

Abstract

Video face re-aging deals with altering the apparent age of a person to the target age in videos. This problem is challenging due to the lack of paired video datasets maintaining temporal consistency in identity and age. Most re-aging methods process each image individually without considering the temporal consistency of videos. While some existing works address the issue of temporal coherence through video facial attribute manipulation in latent space, they often fail to deliver satisfactory performance in age transformation. To tackle the issues, we propose (1) a novel synthetic video dataset that features subjects across a diverse range of age groups; (2) a baseline architecture designed to validate the effectiveness of our proposed dataset, and (3) the development of novel metrics tailored explicitly for evaluating the temporal consistency of video re-aging techniques. Our comprehensive experiments on public datasets, including VFHQ and CelebA-HQ, show that our method outperforms existing approaches in age transformation accuracy and temporal consistency. Notably, in user studies, our method was preferred for temporal consistency by 48.1\% of participants for the older direction and by 39.3\% for the younger direction.

Video Face Re-Aging: Toward Temporally Consistent Face Re-Aging

TL;DR

This work addresses the challenge of aging faces in videos with temporal consistency by creating a synthetic, paired video dataset and a baseline recurrent video framework. A generative model with a recurrent U‑Net and dual discriminators leverages input/output age masks to transform aging across frames, guided by novel temporal metrics TRWC and T-Age. Experiments on CelebV-HQ and VFHQ show improved age transformation accuracy and temporal coherence over state-of-the-art methods, and user studies corroborate the gains in temporal consistency. The data-centric approach and proposed metrics contribute practical tools for developing and evaluating temporally stable video re-aging systems, with implications for graphics, forensics, and media industries, while recognizing ethical considerations and biases inherent in synthetic data.

Abstract

Video face re-aging deals with altering the apparent age of a person to the target age in videos. This problem is challenging due to the lack of paired video datasets maintaining temporal consistency in identity and age. Most re-aging methods process each image individually without considering the temporal consistency of videos. While some existing works address the issue of temporal coherence through video facial attribute manipulation in latent space, they often fail to deliver satisfactory performance in age transformation. To tackle the issues, we propose (1) a novel synthetic video dataset that features subjects across a diverse range of age groups; (2) a baseline architecture designed to validate the effectiveness of our proposed dataset, and (3) the development of novel metrics tailored explicitly for evaluating the temporal consistency of video re-aging techniques. Our comprehensive experiments on public datasets, including VFHQ and CelebA-HQ, show that our method outperforms existing approaches in age transformation accuracy and temporal consistency. Notably, in user studies, our method was preferred for temporal consistency by 48.1\% of participants for the older direction and by 39.3\% for the younger direction.
Paper Structure (38 sections, 9 equations, 11 figures, 11 tables)

This paper contains 38 sections, 9 equations, 11 figures, 11 tables.

Figures (11)

  • Figure 1: Our proposed pipeline to construct the video dataset for re-aging. Firstly, high-resolution synthetic facial images are created using StyleGAN karras2019style Subsequently, images of individuals at different target ages are generated using SAM alaluf2021only for age transformation. Next, key frames are produced by employing OSFV, which alters the pose and expression of these synthetic images. This is achieved without relying on driving images, instead using random values for rotation, translation, and expression keypoints. Finally, motion is added to these key frames using FILM reda2022film, creating smooth and high-fidelity motion videos of subjects at different ages.
  • Figure 2: Overview of our generator for video re-aging.
  • Figure 3: Conceptual overview of proposed TRWC.
  • Figure 4: Performance comparison for all target ages. We compare our method with FRAN zoss2022production for all target ages.
  • Figure 5: Qualitative comparison with existing state-of-the-art methods. The target age is set to 85.
  • ...and 6 more figures