Motion Transfer-Enhanced StyleGAN for Generating Diverse Macaque Facial Expressions
Takuya Igaue, Catia Correia-Caeiro, Akito Yoshida, Takako Miyabe-Nishiwaki, Ryusuke Hayashi
TL;DR
This study tackles the limited diversity of macaque facial expressions in training data by introducing motion-transfer–based data augmentation integrated with StyleGAN2-ADA. By training in two rounds with latent-space–guided sampling and a targeted eye-region loss, the method achieves robust inversion and rich expression editing across two macaque species. The work also analyzes StyleSpace channels to reveal disentangled axes corresponding to mouth and eye movements, enabling linear edits along expression and identity dimensions. The approach enables diverse, controllable macaque face generation with potential applications in automated facial action coding (MaqFACS), welfare monitoring, and cross-species comparative research, advancing ecological validity in neuroscience experiments.
Abstract
Generating animal faces using generative AI techniques is challenging because the available training images are limited both in quantity and variation, particularly for facial expressions across individuals. In this study, we focus on macaque monkeys, widely studied in systems neuroscience and evolutionary research, and propose a method to generate their facial expressions using a style-based generative image model (i.e., StyleGAN2). To address data limitations, we implemented: 1) data augmentation by synthesizing new facial expression images using a motion transfer to animate still images with computer graphics, 2) sample selection based on the latent representation of macaque faces from an initially trained StyleGAN2 model to ensure the variation and uniform sampling in training dataset, and 3) loss function refinement to ensure the accurate reproduction of subtle movements, such as eye movements. Our results demonstrate that the proposed method enables the generation of diverse facial expressions for multiple macaque individuals, outperforming models trained solely on original still images. Additionally, we show that our model is effective for style-based image editing, where specific style parameters correspond to distinct facial movements. These findings underscore the model's potential for disentangling motion components as style parameters, providing a valuable tool for research on macaque facial expressions.
