Gesture Classification in Artworks Using Contextual Image Features
Azhar Hussian, Mathias Zinnen, Thi My Hang Tran, Andreas Maier, Vincent Christlein
TL;DR
The paper addresses smell gesture recognition in historical artworks under a low-data, imbalanced setting. It introduces a two-branch architecture that concurrently processes cropped person regions and full-scene context, fusing representations with a four-layer FCNN to classify six smell-gesture categories, and it relies on pre-detected persons for inference. Results demonstrate that including context consistently boosts $F1$ scores across backbones, though Transformer models underperform relative to CNN backbones likely due to pretraining limitations; notable example gains are observed when context is used. The work advances computational art history by enabling automatic interpretation of uncommon senses in artworks and points to future work in multimodal fusion with pose keypoints and dataset expansion to broader activities to improve generalization.
Abstract
Recognizing gestures in artworks can add a valuable dimension to art understanding and help to acknowledge the role of the sense of smell in cultural heritage. We propose a method to recognize smell gestures in historical artworks. We show that combining local features with global image context improves classification performance notably on different backbones.
