FERGI: Automatic Scoring of User Preferences for Text-to-Image Generation from Spontaneous Facial Expression Reaction
Shuangquan Feng, Junhua Ma, Virginia R. de Sa
TL;DR
FERGI introduces automatic scoring of user preferences for text-to-image generation from spontaneous facial expressions by collecting the FERGI dataset and training FAU-Net to map facial action unit activations to a valence score. The FAU-Net score is shown to complement existing pre-trained human-preference models (ImageReward, PickScore, HPS v2), with integrated scoring achieving up to 68.64% accuracy on image-pair preference tasks, indicating improved alignment with human judgments. The work demonstrates that facial-expression-based signals can provide zero-effort annotation signals to guide fine-tuning of text-to-image models and suggests generalization to other generation tasks, while acknowledging practical limitations such as user camera usage and participant awareness. Overall, FERGI offers a scalable, complementary avenue for capturing user preferences and enhancing perceptual quality in generation systems.
Abstract
Researchers have proposed to use data of human preference feedback to fine-tune text-to-image generative models. However, the scalability of human feedback collection has been limited by its reliance on manual annotation. Therefore, we develop and test a method to automatically score user preferences from their spontaneous facial expression reaction to the generated images. We collect a dataset of Facial Expression Reaction to Generated Images (FERGI) and show that the activations of multiple facial action units (AUs) are highly correlated with user evaluations of the generated images. We develop an FAU-Net (Facial Action Units Neural Network), which receives inputs from an AU estimation model, to automatically score user preferences for text-to-image generation based on their facial expression reactions, which is complementary to the pre-trained scoring models based on the input text prompts and generated images. Integrating our FAU-Net valence score with the pre-trained scoring models improves their consistency with human preferences. This method of automatic annotation with facial expression analysis can be potentially generalized to other generation tasks. The code is available at https://github.com/ShuangquanFeng/FERGI, and the dataset is also available at the same link for research purposes.
