Table of Contents
Fetching ...

Relational Self-supervised Distillation with Compact Descriptors for Image Copy Detection

Juntae Kim, Sungwon Woo, Jongho Nang

TL;DR

This paper proposes a novel method that achieves competitive performance by using a lightweight network and compact descriptors for image copy detection and introduces relational self-supervised distillation for flexible representation in a smaller feature space.

Abstract

Image copy detection is the task of detecting edited copies of any image within a reference database. While previous approaches have shown remarkable progress, the large size of their networks and descriptors remains a disadvantage, complicating their practical application. In this paper, we propose a novel method that achieves competitive performance by using a lightweight network and compact descriptors. By utilizing relational self-supervised distillation to transfer knowledge from a large network to a small network, we enable the training of lightweight networks with smaller descriptor sizes. We introduce relational self-supervised distillation for flexible representation in a smaller feature space and apply contrastive learning with a hard negative loss to prevent dimensional collapse. For the DISC2021 benchmark, ResNet-50 and EfficientNet-B0 are used as the teacher and student models, respectively, with micro average precision improving by 5.0\%/4.9\%/5.9\% for 64/128/256 descriptor sizes compared to the baseline method. The code is available at \href{https://github.com/juntae9926/RDCD}{https://github.com/juntae9926/RDCD}.

Relational Self-supervised Distillation with Compact Descriptors for Image Copy Detection

TL;DR

This paper proposes a novel method that achieves competitive performance by using a lightweight network and compact descriptors for image copy detection and introduces relational self-supervised distillation for flexible representation in a smaller feature space.

Abstract

Image copy detection is the task of detecting edited copies of any image within a reference database. While previous approaches have shown remarkable progress, the large size of their networks and descriptors remains a disadvantage, complicating their practical application. In this paper, we propose a novel method that achieves competitive performance by using a lightweight network and compact descriptors. By utilizing relational self-supervised distillation to transfer knowledge from a large network to a small network, we enable the training of lightweight networks with smaller descriptor sizes. We introduce relational self-supervised distillation for flexible representation in a smaller feature space and apply contrastive learning with a hard negative loss to prevent dimensional collapse. For the DISC2021 benchmark, ResNet-50 and EfficientNet-B0 are used as the teacher and student models, respectively, with micro average precision improving by 5.0\%/4.9\%/5.9\% for 64/128/256 descriptor sizes compared to the baseline method. The code is available at \href{https://github.com/juntae9926/RDCD}{https://github.com/juntae9926/RDCD}.
Paper Structure (28 sections, 7 equations, 6 figures, 15 tables)

This paper contains 28 sections, 7 equations, 6 figures, 15 tables.

Figures (6)

  • Figure 1: Comparison of RDCD(Ours) and other image copy detection methods. RDCD utilizes a lightweight network and achieves high performance with compact descriptor sizes.
  • Figure 2: The overall pipeline of proposed Relational Self-supervised Distillation for Image Copy Detection (RDCD). The method combines three key components: (1) Relational Self-supervised Distillation (RSD) which transfers knowledge from a pre-trained teacher network to a lightweight student network, (2) Contrastive Learning which uses MoCo-v2, and (3) Hard Negative (HN) Loss to address challenging negative samples. The student network $f^S_\theta$ extracts features $h^S$ and $h^{S'}$ from two augmented views of an input image. These features are then projected to a lower dimension for use in contrastive learning. The teacher network $f^T_\theta$ guides RSD by comparing similarities between instances in its feature space. The final RDCD loss is a weighted combination of RSD loss ($\mathcal{L}_{rel}$), contrastive loss ($\mathcal{L}_{con}$), and HN loss ($\mathcal{L}_{hn}$), enabling the student to learn compact yet effective descriptors for image copy detection.
  • Figure 3: Log of singular values for a descriptor size of 256, with and without HN loss.
  • Figure 4: Comparison of the difference in similarity between positives and nearest negatives with and without the use of HN Loss.
  • Figure 5: T-SNE visualization of descriptors from different model configurations: a) DINO baseline, b) DINO-MoCo-v2 w/ HN loss, c) SSCD-MoCo-v2 w/ HN loss. d) DINO-MoCo-v2 w/o HN loss, e) SSCD-MoCo-v2 w/o HN loss. In a),b) and c), the final features are used as descriptors whereas in d) and e), intermediate features are used. We randomly select 9 classes and visualize 50 images from each class from ImageNet.
  • ...and 1 more figures