Table of Contents
Fetching ...

Decoding Emotions in Abstract Art: Cognitive Plausibility of CLIP in Recognizing Color-Emotion Associations

Hanna-Sophia Widhoelzl, Ece Takmaz

TL;DR

This work examines the cognitive plausibility of CLIP in decoding emotions elicited by abstract art using the FeelingBlue dataset, combining zero-shot emotion classification, linguistic analysis of rationales, and color-emotion interaction studies. By comparing image- and rationale-based predictions and applying a similarity-based encoding, the authors quantify how closely CLIP's representations align with human emotion judgments, finding modest zero-shot performance on images but substantial gains from leveraging neighboring embeddings ($\approx$47.5% accuracy). Rationales yield higher zero-shot accuracy, and color-word patterns in both images and rationales reveal emotion-color associations that CLIP often tracks more closely to literature than to human labels. Overall, results highlight a meaningful but imperfect gap between human cognitive processing of abstract art emotions and current cross-modal models, informing future cognitive-modeling directions and prompting exploration of multi-label and theory-informed approaches.

Abstract

This study investigates the cognitive plausibility of a pretrained multimodal model, CLIP, in recognizing emotions evoked by abstract visual art. We employ a dataset comprising images with associated emotion labels and textual rationales of these labels provided by human annotators. We perform linguistic analyses of rationales, zero-shot emotion classification of images and rationales, apply similarity-based prediction of emotion, and investigate color-emotion associations. The relatively low, yet above baseline, accuracy in recognizing emotion for abstract images and rationales suggests that CLIP decodes emotional complexities in a manner not well aligned with human cognitive processes. Furthermore, we explore color-emotion interactions in images and rationales. Expected color-emotion associations, such as red relating to anger, are identified in images and texts annotated with emotion labels by both humans and CLIP, with the latter showing even stronger interactions. Our results highlight the disparity between human processing and machine processing when connecting image features and emotions.

Decoding Emotions in Abstract Art: Cognitive Plausibility of CLIP in Recognizing Color-Emotion Associations

TL;DR

This work examines the cognitive plausibility of CLIP in decoding emotions elicited by abstract art using the FeelingBlue dataset, combining zero-shot emotion classification, linguistic analysis of rationales, and color-emotion interaction studies. By comparing image- and rationale-based predictions and applying a similarity-based encoding, the authors quantify how closely CLIP's representations align with human emotion judgments, finding modest zero-shot performance on images but substantial gains from leveraging neighboring embeddings (47.5% accuracy). Rationales yield higher zero-shot accuracy, and color-word patterns in both images and rationales reveal emotion-color associations that CLIP often tracks more closely to literature than to human labels. Overall, results highlight a meaningful but imperfect gap between human cognitive processing of abstract art emotions and current cross-modal models, informing future cognitive-modeling directions and prompting exploration of multi-label and theory-informed approaches.

Abstract

This study investigates the cognitive plausibility of a pretrained multimodal model, CLIP, in recognizing emotions evoked by abstract visual art. We employ a dataset comprising images with associated emotion labels and textual rationales of these labels provided by human annotators. We perform linguistic analyses of rationales, zero-shot emotion classification of images and rationales, apply similarity-based prediction of emotion, and investigate color-emotion associations. The relatively low, yet above baseline, accuracy in recognizing emotion for abstract images and rationales suggests that CLIP decodes emotional complexities in a manner not well aligned with human cognitive processes. Furthermore, we explore color-emotion interactions in images and rationales. Expected color-emotion associations, such as red relating to anger, are identified in images and texts annotated with emotion labels by both humans and CLIP, with the latter showing even stronger interactions. Our results highlight the disparity between human processing and machine processing when connecting image features and emotions.
Paper Structure (22 sections, 5 figures, 1 table)

This paper contains 22 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Emotion allocation for 'Composizione' by Antonio Sanfilippo (1955), from FeelingBlue ananthram2023feelingblue.
  • Figure 2: Color-emotion associations in human-annotated images with frequency counts per emotion category.
  • Figure 3: Color-emotion associations in CLIP-annotated images with frequency counts per emotion category.
  • Figure 4: Color-emotion associations in human-annotated rationales with frequency counts per emotion category.
  • Figure 5: Color-emotion associations in CLIP-annotated rationales with frequency counts per emotion category.