Table of Contents
Fetching ...

Transductive Learning for Near-Duplicate Image Detection in Scanned Photo Collections

Francesc Net, Marc Folia, Pep Casals, Lluis Gomez

TL;DR

A transductive learning approach that leverages state-of-the-art deep learning architectures such as convolutional neural networks (CNNs) and Vision Transformers (ViTs) is proposed and shows that the proposed approach outperforms the baseline methods in the task of near-duplicate image detection in the UKBench and an in-house private dataset.

Abstract

This paper presents a comparative study of near-duplicate image detection techniques in a real-world use case scenario, where a document management company is commissioned to manually annotate a collection of scanned photographs. Detecting duplicate and near-duplicate photographs can reduce the time spent on manual annotation by archivists. This real use case differs from laboratory settings as the deployment dataset is available in advance, allowing the use of transductive learning. We propose a transductive learning approach that leverages state-of-the-art deep learning architectures such as convolutional neural networks (CNNs) and Vision Transformers (ViTs). Our approach involves pre-training a deep neural network on a large dataset and then fine-tuning the network on the unlabeled target collection with self-supervised learning. The results show that the proposed approach outperforms the baseline methods in the task of near-duplicate image detection in the UKBench and an in-house private dataset.

Transductive Learning for Near-Duplicate Image Detection in Scanned Photo Collections

TL;DR

A transductive learning approach that leverages state-of-the-art deep learning architectures such as convolutional neural networks (CNNs) and Vision Transformers (ViTs) is proposed and shows that the proposed approach outperforms the baseline methods in the task of near-duplicate image detection in the UKBench and an in-house private dataset.

Abstract

This paper presents a comparative study of near-duplicate image detection techniques in a real-world use case scenario, where a document management company is commissioned to manually annotate a collection of scanned photographs. Detecting duplicate and near-duplicate photographs can reduce the time spent on manual annotation by archivists. This real use case differs from laboratory settings as the deployment dataset is available in advance, allowing the use of transductive learning. We propose a transductive learning approach that leverages state-of-the-art deep learning architectures such as convolutional neural networks (CNNs) and Vision Transformers (ViTs). Our approach involves pre-training a deep neural network on a large dataset and then fine-tuning the network on the unlabeled target collection with self-supervised learning. The results show that the proposed approach outperforms the baseline methods in the task of near-duplicate image detection in the UKBench and an in-house private dataset.

Paper Structure

This paper contains 13 sections, 1 equation, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Duplicate and near-duplicate images can appear in a given photo collection for different reasons: (a) shots from the same studio session, (b) scanned images from two paper originals with slight variations due to the paper wear over time, (c) different shots in a sequence of the same scene, etc. The images shown in this figure are reproduced from the DigitaltMuseum with Creative Common licenses (CC BY-NC-ND and CC Public Domain). The respective authors are: (a) Carl Johansson, (b) Anna Riwkin-Brick, and (c) Sune Sundahl.
  • Figure 2: An overview of the near-duplicates detection system.
  • Figure 3: An illustration of the SimCLR training process.
  • Figure 4: An illustration of the Masked Autoencoders' workflow.
  • Figure 5: Examples of near-duplicate images from the UKBench dataset. Each image query has 3 near-duplicates that correspond to different points of view, camera rotations, illumination changes, etc.
  • ...and 2 more figures