Detecting Media Clones in Cultural Repositories Using a Positive Unlabeled Learning Approach

V. Sevetlidis; V. Arampatzakis; M. Karta; I. Mourthos; D. Tsiafaki; G. Pavlidis

Detecting Media Clones in Cultural Repositories Using a Positive Unlabeled Learning Approach

V. Sevetlidis, V. Arampatzakis, M. Karta, I. Mourthos, D. Tsiafaki, G. Pavlidis

Abstract

We formulate curator-in-the-loop duplicate discovery in the AtticPOT repository as a Positive-Unlabeled (PU) learning problem. Given a single anchor per artefact, we train a lightweight per-query Clone Encoder on augmented views of the anchor and score the unlabeled repository with an interpretable threshold on the latent l_2 norm. The system proposes candidates for curator verification, uncovering cross-record duplicates that were not verified a priori. On CIFAR-10 we obtain F1=96.37 (AUROC=97.97); on AtticPOT we reach F1=90.79 (AUROC=98.99), improving F1 by +7.70 points over the best baseline (SVDD) under the same lightweight backbone. Qualitative "find-similar" panels show stable neighbourhoods across viewpoint and condition. The method avoids explicit negatives, offers a transparent operating point, and fits de-duplication, record linkage, and curator-in-the-loop workflows.

Detecting Media Clones in Cultural Repositories Using a Positive Unlabeled Learning Approach

Abstract

Detecting Media Clones in Cultural Repositories Using a Positive Unlabeled Learning Approach

Abstract

Paper Structure

Table of Contents

Figures (6)