ST-SACLF: Style Transfer Informed Self-Attention Classifier for Bias-Aware Painting Classification
Mridula Vijendran, Frederick W. B. Li, Jingjing Deng, Hubert P. H. Shum
TL;DR
This work tackles cross-domain biases in painting classification by introducing AdaIN-based data augmentation that stylizes class-specific samples to bridge gaps between real artwork and paintings. A spatial-attention classifier fuses multi-scale feature maps, and a two-stage optimization workflow (grid then Bayesian search with gradual unfreezing) refines both augmentation choices and model parameters, enabling effective handling of class imbalance. Empirical results on the Kaokore dataset demonstrate competitive accuracy (87.24% with ResNet-50 in 40 epochs) and show the complementary value of qualitative attention analyses for interpretability. The approach offers practical benefits for gallery-style recommendation systems and can generalize to other painting corpora, with future directions including larger datasets and integration of geometric priors.
Abstract
Painting classification plays a vital role in organizing, finding, and suggesting artwork for digital and classic art galleries. Existing methods struggle with adapting knowledge from the real world to artistic images during training, leading to poor performance when dealing with different datasets. Our innovation lies in addressing these challenges through a two-step process. First, we generate more data using Style Transfer with Adaptive Instance Normalization (AdaIN), bridging the gap between diverse styles. Then, our classifier gains a boost with feature-map adaptive spatial attention modules, improving its understanding of artistic details. Moreover, we tackle the problem of imbalanced class representation by dynamically adjusting augmented samples. Through a dual-stage process involving careful hyperparameter search and model fine-tuning, we achieve an impressive 87.24\% accuracy using the ResNet-50 backbone over 40 training epochs. Our study explores quantitative analyses that compare different pretrained backbones, investigates model optimization through ablation studies, and examines how varying augmentation levels affect model performance. Complementing this, our qualitative experiments offer valuable insights into the model's decision-making process using spatial attention and its ability to differentiate between easy and challenging samples based on confidence ranking.
