Emotional Images: Assessing Emotions in Images and Potential Biases in Generative Models
Maneet Mehta, Cody Buntain
TL;DR
The paper addresses potential negative-emotion bias in AI-generated imagery and its amplification in online media. It evaluates three emotion-recognition strategies (zero-shot vision-language, fine-tuned vision models, and auto-captioning) on EmoSet, finding that fine-tuned ViT yields the strongest performance ($F_1$ up to $0.7343$). Cross-modal analysis shows generated images skew toward negative emotions, notably fear, compared with the associated prompts, a pattern observed across Stable Diffusion and GPT-4o workflows. The work suggests a multidisciplinary path to align AI emotion understanding with psychology and proposes new datasets and evaluation frameworks to better capture multi-label emotional salience. Overall, the findings highlight a potential anti-social feedback loop in generative AI outputs and emphasize the need for safeguards and methodological advances to reduce negative-emotion bias in digital media.
Abstract
This paper examines potential biases and inconsistencies in emotional evocation of images produced by generative artificial intelligence (AI) models and their potential bias toward negative emotions. In particular, we assess this bias by comparing the emotions evoked by an AI-produced image to the emotions evoked by prompts used to create those images. As a first step, the study evaluates three approaches for identifying emotions in images -- traditional supervised learning, zero-shot learning with vision-language models, and cross-modal auto-captioning -- using EmoSet, a large dataset of image-emotion annotations that categorizes images across eight emotional types. Results show fine-tuned models, particularly Google's Vision Transformer (ViT), significantly outperform zero-shot and caption-based methods in recognizing emotions in images. For a cross-modality comparison, we then analyze the differences between emotions in text prompts -- via existing text-based emotion-recognition models -- and the emotions evoked in the resulting images. Findings indicate that AI-generated images frequently lean toward negative emotional content, regardless of the original prompt. This emotional skew in generative models could amplify negative affective content in digital spaces, perpetuating its prevalence and impact. The study advocates for a multidisciplinary approach to better align AI emotion recognition with psychological insights and address potential biases in generative AI outputs across digital media.
