Table of Contents
Fetching ...

Enhancing Historical Image Retrieval with Compositional Cues

Tingyu Lin, Robert Sablatnig

TL;DR

The paper tackles the limited effectiveness of semantic-only image retrieval in historical archives by introducing composition as a complementary cue. It presents a dual-network approach: a Composition Clues Network (CCNet) that learns a Key Composition Map (KCM) from grayscale historical cues, and a Content-Based Image Retrieval Network (CBIRNet) that fuses content features with KCM-guided information using a tunable weight $L_{KCM}$. Trained on the KU-PCP composition dataset and evaluated on the HISTORIAN historical video dataset, the method demonstrates improved retrieval performance and perceptual alignment with target images, both quantitatively and qualitatively. This work highlights the practical value of incorporating computational aesthetics into CBIR for cultural heritage analytics and suggests directions for richer fusion strategies and dataset design.

Abstract

In analyzing vast amounts of digitally stored historical image data, existing content-based retrieval methods often overlook significant non-semantic information, limiting their effectiveness for flexible exploration across varied themes. To broaden the applicability of image retrieval methods for diverse purposes and uncover more general patterns, we innovatively introduce a crucial factor from computational aesthetics, namely image composition, into this topic. By explicitly integrating composition-related information extracted by CNN into the designed retrieval model, our method considers both the image's composition rules and semantic information. Qualitative and quantitative experiments demonstrate that the image retrieval network guided by composition information outperforms those relying solely on content information, facilitating the identification of images in databases closer to the target image in human perception. Please visit https://github.com/linty5/CCBIR to try our codes.

Enhancing Historical Image Retrieval with Compositional Cues

TL;DR

The paper tackles the limited effectiveness of semantic-only image retrieval in historical archives by introducing composition as a complementary cue. It presents a dual-network approach: a Composition Clues Network (CCNet) that learns a Key Composition Map (KCM) from grayscale historical cues, and a Content-Based Image Retrieval Network (CBIRNet) that fuses content features with KCM-guided information using a tunable weight . Trained on the KU-PCP composition dataset and evaluated on the HISTORIAN historical video dataset, the method demonstrates improved retrieval performance and perceptual alignment with target images, both quantitatively and qualitatively. This work highlights the practical value of incorporating computational aesthetics into CBIR for cultural heritage analytics and suggests directions for richer fusion strategies and dataset design.

Abstract

In analyzing vast amounts of digitally stored historical image data, existing content-based retrieval methods often overlook significant non-semantic information, limiting their effectiveness for flexible exploration across varied themes. To broaden the applicability of image retrieval methods for diverse purposes and uncover more general patterns, we innovatively introduce a crucial factor from computational aesthetics, namely image composition, into this topic. By explicitly integrating composition-related information extracted by CNN into the designed retrieval model, our method considers both the image's composition rules and semantic information. Qualitative and quantitative experiments demonstrate that the image retrieval network guided by composition information outperforms those relying solely on content information, facilitating the identification of images in databases closer to the target image in human perception. Please visit https://github.com/linty5/CCBIR to try our codes.
Paper Structure (16 sections, 5 figures, 1 table)

This paper contains 16 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Illustration of converting historical films into pairs of images for retrieval experiments.
  • Figure 2: Illustration of our proposed CCNet and CBIRNet.
  • Figure 3: Visualisation of KCM effect. The top row features the original grayscale images, and the bottom row highlights the KCM, pinpointing key compositional areas as detected by our model.
  • Figure 4: Scatter plot and histogram of positive and negative samples when $L_{KCM}$ is 0.5.
  • Figure 5: Comparison of retrieval results with different $L_{KCM}$. We selected only the central frame image from each shot in the test set as the target database for retrieval, returning the five highest similarity-scored images for a single image query.