Table of Contents
Fetching ...

Gesture Classification in Artworks Using Contextual Image Features

Azhar Hussian, Mathias Zinnen, Thi My Hang Tran, Andreas Maier, Vincent Christlein

TL;DR

The paper addresses smell gesture recognition in historical artworks under a low-data, imbalanced setting. It introduces a two-branch architecture that concurrently processes cropped person regions and full-scene context, fusing representations with a four-layer FCNN to classify six smell-gesture categories, and it relies on pre-detected persons for inference. Results demonstrate that including context consistently boosts $F1$ scores across backbones, though Transformer models underperform relative to CNN backbones likely due to pretraining limitations; notable example gains are observed when context is used. The work advances computational art history by enabling automatic interpretation of uncommon senses in artworks and points to future work in multimodal fusion with pose keypoints and dataset expansion to broader activities to improve generalization.

Abstract

Recognizing gestures in artworks can add a valuable dimension to art understanding and help to acknowledge the role of the sense of smell in cultural heritage. We propose a method to recognize smell gestures in historical artworks. We show that combining local features with global image context improves classification performance notably on different backbones.

Gesture Classification in Artworks Using Contextual Image Features

TL;DR

The paper addresses smell gesture recognition in historical artworks under a low-data, imbalanced setting. It introduces a two-branch architecture that concurrently processes cropped person regions and full-scene context, fusing representations with a four-layer FCNN to classify six smell-gesture categories, and it relies on pre-detected persons for inference. Results demonstrate that including context consistently boosts scores across backbones, though Transformer models underperform relative to CNN backbones likely due to pretraining limitations; notable example gains are observed when context is used. The work advances computational art history by enabling automatic interpretation of uncommon senses in artworks and points to future work in multimodal fusion with pose keypoints and dataset expansion to broader activities to improve generalization.

Abstract

Recognizing gestures in artworks can add a valuable dimension to art understanding and help to acknowledge the role of the sense of smell in cultural heritage. We propose a method to recognize smell gestures in historical artworks. We show that combining local features with global image context improves classification performance notably on different backbones.

Paper Structure

This paper contains 5 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Example of each class in SniffyArt Dataset zinnen2023sniffyart
  • Figure 2: Class Distribution Excluding Background Class. It is important to note that the dataset has not only a small number of samples per class but also a significant class imbalance. Figure taken from zinnen2023sniffyart with permission to reuse granted by the authors.
  • Figure 3: Architecture diagram for the proposed model. The cropped person and the full context image are passed through separate backbones. Finally, the outputs of these backbones are concatenated and passed to the classifier.