Enhancing Historical Image Retrieval with Compositional Cues
Tingyu Lin, Robert Sablatnig
TL;DR
The paper tackles the limited effectiveness of semantic-only image retrieval in historical archives by introducing composition as a complementary cue. It presents a dual-network approach: a Composition Clues Network (CCNet) that learns a Key Composition Map (KCM) from grayscale historical cues, and a Content-Based Image Retrieval Network (CBIRNet) that fuses content features with KCM-guided information using a tunable weight $L_{KCM}$. Trained on the KU-PCP composition dataset and evaluated on the HISTORIAN historical video dataset, the method demonstrates improved retrieval performance and perceptual alignment with target images, both quantitatively and qualitatively. This work highlights the practical value of incorporating computational aesthetics into CBIR for cultural heritage analytics and suggests directions for richer fusion strategies and dataset design.
Abstract
In analyzing vast amounts of digitally stored historical image data, existing content-based retrieval methods often overlook significant non-semantic information, limiting their effectiveness for flexible exploration across varied themes. To broaden the applicability of image retrieval methods for diverse purposes and uncover more general patterns, we innovatively introduce a crucial factor from computational aesthetics, namely image composition, into this topic. By explicitly integrating composition-related information extracted by CNN into the designed retrieval model, our method considers both the image's composition rules and semantic information. Qualitative and quantitative experiments demonstrate that the image retrieval network guided by composition information outperforms those relying solely on content information, facilitating the identification of images in databases closer to the target image in human perception. Please visit https://github.com/linty5/CCBIR to try our codes.
