Table of Contents
Fetching ...

Smell and Emotion: Recognising emotions in smell-related artworks

Vishal Patoliya, Mathias Zinnen, Andreas Maier, Vincent Christlein

TL;DR

This work addresses the problem of recognizing person-level emotions in smell-related artworks, a domain with limited prior work and a notable domain gap to photographic emotion datasets. It adopts a two-branch CNN architecture inspired by EMOTIC to fuse person-centered and scene-context cues, trained with a combination of discrete emotion classification and continuous valence/arousal/dominance regression, and evaluates on both EMOTIC and the ODOR-e artwork test set. A stylized artefact dataset (EMOTIC-s) via internal-external learning is used to bridge the gap between natural and artistic imagery, and hyperparameter tuning (body crop size, pretrained context backbones) is explored for performance gains. The findings show that while emotion recognition in artworks is feasible, performance lags behind natural images due to domain differences; modest improvements are achievable through tuning and style-transfer, but robust cross-domain art emotion recognition requires larger artistic datasets and better style-transfer methods that preserve facial and expressive cues. This work lays groundwork for interdisciplinary exploration at the intersection of olfactory references, emotion analysis, and computational humanities.

Abstract

Emotions and smell are underrepresented in digital art history. In this exploratory work, we show that recognising emotions from smell-related artworks is technically feasible but has room for improvement. Using style transfer and hyperparameter optimization we achieve a minor performance boost and open up the field for future extensions.

Smell and Emotion: Recognising emotions in smell-related artworks

TL;DR

This work addresses the problem of recognizing person-level emotions in smell-related artworks, a domain with limited prior work and a notable domain gap to photographic emotion datasets. It adopts a two-branch CNN architecture inspired by EMOTIC to fuse person-centered and scene-context cues, trained with a combination of discrete emotion classification and continuous valence/arousal/dominance regression, and evaluates on both EMOTIC and the ODOR-e artwork test set. A stylized artefact dataset (EMOTIC-s) via internal-external learning is used to bridge the gap between natural and artistic imagery, and hyperparameter tuning (body crop size, pretrained context backbones) is explored for performance gains. The findings show that while emotion recognition in artworks is feasible, performance lags behind natural images due to domain differences; modest improvements are achievable through tuning and style-transfer, but robust cross-domain art emotion recognition requires larger artistic datasets and better style-transfer methods that preserve facial and expressive cues. This work lays groundwork for interdisciplinary exploration at the intersection of olfactory references, emotion analysis, and computational humanities.

Abstract

Emotions and smell are underrepresented in digital art history. In this exploratory work, we show that recognising emotions from smell-related artworks is technically feasible but has room for improvement. Using style transfer and hyperparameter optimization we achieve a minor performance boost and open up the field for future extensions.
Paper Structure (10 sections, 3 figures, 3 tables)

This paper contains 10 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Example of a style transfer using internal-external learning style: Image (a) is the normal photographic image from the EMOTIC dataset where the style transfer will be applied. Image (b) is the artwork image from WikiArt used to get style features. Finally, Image (c) is the output image where the main body or object content is from Image (a) and the colouring effect is from Image (b).
  • Figure 2: Model architecture for Emotion Recognition in Context which uses two different branches to extract body and context features separately. In the fusion network, both features are merged for the final prediction (Discrete and Continuous dimensions). Figure taken from emotic_pami2019 with permission from the authors.
  • Figure 3: Example of misclassifications from ODOR-e dataset where Ground Truth (GT) emotion and predicted emotion (Pred) are completely different. Image credits (left to right): Merry man holding a pewter jug and a pipe. Circle of Frans Hals. 1638--1640. https://rkd.nl/en/explore/images/302304, Portrait of Arnold Aletrino. Jan Veth. 1885. https://https://rkd.nl/en/explore/images/20797.