Table of Contents
Fetching ...

Motion Transfer-Enhanced StyleGAN for Generating Diverse Macaque Facial Expressions

Takuya Igaue, Catia Correia-Caeiro, Akito Yoshida, Takako Miyabe-Nishiwaki, Ryusuke Hayashi

TL;DR

This study tackles the limited diversity of macaque facial expressions in training data by introducing motion-transfer–based data augmentation integrated with StyleGAN2-ADA. By training in two rounds with latent-space–guided sampling and a targeted eye-region loss, the method achieves robust inversion and rich expression editing across two macaque species. The work also analyzes StyleSpace channels to reveal disentangled axes corresponding to mouth and eye movements, enabling linear edits along expression and identity dimensions. The approach enables diverse, controllable macaque face generation with potential applications in automated facial action coding (MaqFACS), welfare monitoring, and cross-species comparative research, advancing ecological validity in neuroscience experiments.

Abstract

Generating animal faces using generative AI techniques is challenging because the available training images are limited both in quantity and variation, particularly for facial expressions across individuals. In this study, we focus on macaque monkeys, widely studied in systems neuroscience and evolutionary research, and propose a method to generate their facial expressions using a style-based generative image model (i.e., StyleGAN2). To address data limitations, we implemented: 1) data augmentation by synthesizing new facial expression images using a motion transfer to animate still images with computer graphics, 2) sample selection based on the latent representation of macaque faces from an initially trained StyleGAN2 model to ensure the variation and uniform sampling in training dataset, and 3) loss function refinement to ensure the accurate reproduction of subtle movements, such as eye movements. Our results demonstrate that the proposed method enables the generation of diverse facial expressions for multiple macaque individuals, outperforming models trained solely on original still images. Additionally, we show that our model is effective for style-based image editing, where specific style parameters correspond to distinct facial movements. These findings underscore the model's potential for disentangling motion components as style parameters, providing a valuable tool for research on macaque facial expressions.

Motion Transfer-Enhanced StyleGAN for Generating Diverse Macaque Facial Expressions

TL;DR

This study tackles the limited diversity of macaque facial expressions in training data by introducing motion-transfer–based data augmentation integrated with StyleGAN2-ADA. By training in two rounds with latent-space–guided sampling and a targeted eye-region loss, the method achieves robust inversion and rich expression editing across two macaque species. The work also analyzes StyleSpace channels to reveal disentangled axes corresponding to mouth and eye movements, enabling linear edits along expression and identity dimensions. The approach enables diverse, controllable macaque face generation with potential applications in automated facial action coding (MaqFACS), welfare monitoring, and cross-species comparative research, advancing ecological validity in neuroscience experiments.

Abstract

Generating animal faces using generative AI techniques is challenging because the available training images are limited both in quantity and variation, particularly for facial expressions across individuals. In this study, we focus on macaque monkeys, widely studied in systems neuroscience and evolutionary research, and propose a method to generate their facial expressions using a style-based generative image model (i.e., StyleGAN2). To address data limitations, we implemented: 1) data augmentation by synthesizing new facial expression images using a motion transfer to animate still images with computer graphics, 2) sample selection based on the latent representation of macaque faces from an initially trained StyleGAN2 model to ensure the variation and uniform sampling in training dataset, and 3) loss function refinement to ensure the accurate reproduction of subtle movements, such as eye movements. Our results demonstrate that the proposed method enables the generation of diverse facial expressions for multiple macaque individuals, outperforming models trained solely on original still images. Additionally, we show that our model is effective for style-based image editing, where specific style parameters correspond to distinct facial movements. These findings underscore the model's potential for disentangling motion components as style parameters, providing a valuable tool for research on macaque facial expressions.

Paper Structure

This paper contains 42 sections, 2 equations, 13 figures, 6 tables.

Figures (13)

  • Figure 1: Results of manipulating macaque face images using the StyleGAN2 model trained using the proposed method. The images labeled Inversion were generated from the estimated latent codes of the source images using the trained model and demonstrate successful latent learning. The trained model also acquired a well-disentangled latent representation, which enabled the generation of photorealistic macaque face images with various semantic edits, such as adjusting the head orientation to the right (third column labeled Head orientation) and alternating the appearance between two species (i.e., Japanese macaque and Rhesus macaque), labeled Species. Additionally, the model can generate various characteristic macaque facial expressions, including Threat, Scream, and Bared-teeth, in addition to eye movements such as Blink and Look-left.
  • Figure 2: Overview of our proposed method for generating diverse macaque facial expressions using the StyleGAN2 model. We incorporated facial expression data augmentation via motion transfer techniques, using driving videos created by realistic CG models (MF3D) to systematically transfer labeled facial expressions to still images of diverse macaque individuals. This method allowed us to expand the training dataset in terms of both quantity and expression variations. Additionally, in the second round of training, we applied a specialized loss function to enhance the inversion quality of subtle facial movements, particularly around the eyes and selected our training images to correct for biases in the neutral facial expressions and ensure greater diversity.
  • Figure 3: Qualitative comparison of the image inversion results using different training conditions. ReStyle alaluf2021restyle trained solely on the still image dataset failed to reconstruct facial expressions, particularly for images displaying closed eyes or open mouths, such as images in the sixth and seventh columns. By contrast, the proposed method successfully inverted the mouth movements and images showing exposed teeth, as indicated in the third and bottom rows. Moreover, the L2 loss for the eyes improved inversion quality around the eye regions including the reconstruction of eye movements as seen in the first column of the bottom row, which replicates the small sclera in macaques and depicts the eye looking to the left.
  • Figure 4: Inversion error for each facial expression. Errors were computed over 20 randomly selected test images inverted using the model trained only on still images with 120,000 iterations, labeled "ReStyle" and the model trained using our method for 320,000 iterations, labeled "Ours". Asterisks indicate the errors for each facial expression that showed a statistically significant difference between the two models, based on a pairwise t-test with Holm adjustment.
  • Figure 5: Editing results using annotation information about facial expression types. The editing strength was manually adjusted for each image and condition to produce the results shown in this figure.
  • ...and 8 more figures