ArtContext: Contextualizing Artworks with Open-Access Art History Articles and Wikidata Knowledge through a LoRA-Tuned CLIP Model
Samuel Waugh, Stuart James
TL;DR
ArtContext tackles the challenge of grounding paintings in scholarly prose by linking artworks to sentences from open-access art-history articles and Wikidata metadata. It constructs a large-scale, weakly supervised training pipeline that ingests 27,044 open-access articles across 450 artists from OpenAlex, extracts candidate contexts with Sentence-BERT, and aligns them to paintings using Wikidata-informed queries to produce 29,697 image–text pairs. These pairs train PaintingCLIP, a LoRA-adapted version of CLIP for domain-specific grounding, achieving improved retrieval performance over vanilla CLIP while preserving zero-shot capabilities. The approach demonstrates that weak, scalable supervision from scholarly text can adapt vision–language models for nuanced humanities tasks and is readily generalizable to other domains with rich metadata and textual corpora.
Abstract
Many Art History articles discuss artworks in general as well as specific parts of works, such as layout, iconography, or material culture. However, when viewing an artwork, it is not trivial to identify what different articles have said about the piece. Therefore, we propose ArtContext, a pipeline for taking a corpus of Open-Access Art History articles and Wikidata Knowledge and annotating Artworks with this information. We do this using a novel corpus collection pipeline, then learn a bespoke CLIP model adapted using Low-Rank Adaptation (LoRA) to make it domain-specific. We show that the new model, PaintingCLIP, which is weakly supervised by the collected corpus, outperforms CLIP and provides context for a given artwork. The proposed pipeline is generalisable and can be readily applied to numerous humanities areas.
