Graph Neural Networks for Knowledge Enhanced Visual Representation of Paintings
Athanasios Efthymiou, Stevan Rudinac, Monika Kackovic, Marcel Worring, Nachoem Wijnberg
TL;DR
ArtSAGENet introduces a knowledge-enhanced, multimodal framework that fuses CNN-based visual features with GNN-modeled semantic relationships among paintings to improve fine art analysis. By employing scalable graph neural networks and multi-task learning, the approach achieves state-of-the-art results on style classification, artist attribution, and creation-year estimation on WikiArt variants, while maintaining data efficiency and reduced training time. The method demonstrates both quantitative gains and qualitative improvements in retrieval, underscoring the value of integrating visual content with semantic context for art analysis and curation. This work paves the way for knowledge-aware art understanding and efficient curation in large-scale art collections.
Abstract
We propose ArtSAGENet, a novel multimodal architecture that integrates Graph Neural Networks (GNNs) and Convolutional Neural Networks (CNNs), to jointly learn visual and semantic-based artistic representations. First, we illustrate the significant advantages of multi-task learning for fine art analysis and argue that it is conceptually a much more appropriate setting in the fine art domain than the single-task alternatives. We further demonstrate that several GNN architectures can outperform strong CNN baselines in a range of fine art analysis tasks, such as style classification, artist attribution, creation period estimation, and tag prediction, while training them requires an order of magnitude less computational time and only a small amount of labeled data. Finally, through extensive experimentation we show that our proposed ArtSAGENet captures and encodes valuable relational dependencies between the artists and the artworks, surpassing the performance of traditional methods that rely solely on the analysis of visual content. Our findings underline a great potential of integrating visual content and semantics for fine art analysis and curation.
