Table of Contents
Fetching ...

Weakly Supervised Training for Hologram Verification in Identity Documents

Glen Pouliquen, Guillaume Chiron, Joseph Chazalon, Thierry Géraud, Ahmad Montaser Awal

TL;DR

This work tackles remote verification of Optically Variable Devices (OVDs) in identity documents captured by smartphones. It introduces a weakly supervised contrastive learning framework that uses a triplet loss with a margin $m=1$ and distance $d(x,y)=||x-y||_2$, enabling learning without per-frame labels and producing frame embeddings whose cosine similarities are used for a final Original/Attack decision. The approach is evaluated on MIDV-Holo and MIDV-2020 with ROI-focused regions and cross-validation, achieving leading performance on MIDV-Holo and robust attack detection, including photo replacement attacks. The authors provide an open-source baseline for comparison, extend the MIDV-Holo dataset protocol, and demonstrate the method’s generalization across datasets and backbones, underscoring the potential for scalable, data-efficient remote identity verification on commodity smartphones.

Abstract

We propose a method to remotely verify the authenticity of Optically Variable Devices (OVDs), often referred to as ``holograms'', in identity documents. Our method processes video clips captured with smartphones under common lighting conditions, and is evaluated on two public datasets: MIDV-HOLO and MIDV-2020. Thanks to a weakly-supervised training, we optimize a feature extraction and decision pipeline which achieves a new leading performance on MIDV-HOLO, while maintaining a high recall on documents from MIDV-2020 used as attack samples. It is also the first method, to date, to effectively address the photo replacement attack task, and can be trained on either genuine samples, attack samples, or both for increased performance. By enabling to verify OVD shapes and dynamics with very little supervision, this work opens the way towards the use of massive amounts of unlabeled data to build robust remote identity document verification systems on commodity smartphones. Code is available at https://github.com/EPITAResearchLab/pouliquen.24.icdar

Weakly Supervised Training for Hologram Verification in Identity Documents

TL;DR

This work tackles remote verification of Optically Variable Devices (OVDs) in identity documents captured by smartphones. It introduces a weakly supervised contrastive learning framework that uses a triplet loss with a margin and distance , enabling learning without per-frame labels and producing frame embeddings whose cosine similarities are used for a final Original/Attack decision. The approach is evaluated on MIDV-Holo and MIDV-2020 with ROI-focused regions and cross-validation, achieving leading performance on MIDV-Holo and robust attack detection, including photo replacement attacks. The authors provide an open-source baseline for comparison, extend the MIDV-Holo dataset protocol, and demonstrate the method’s generalization across datasets and backbones, underscoring the potential for scalable, data-efficient remote identity verification on commodity smartphones.

Abstract

We propose a method to remotely verify the authenticity of Optically Variable Devices (OVDs), often referred to as ``holograms'', in identity documents. Our method processes video clips captured with smartphones under common lighting conditions, and is evaluated on two public datasets: MIDV-HOLO and MIDV-2020. Thanks to a weakly-supervised training, we optimize a feature extraction and decision pipeline which achieves a new leading performance on MIDV-HOLO, while maintaining a high recall on documents from MIDV-2020 used as attack samples. It is also the first method, to date, to effectively address the photo replacement attack task, and can be trained on either genuine samples, attack samples, or both for increased performance. By enabling to verify OVD shapes and dynamics with very little supervision, this work opens the way towards the use of massive amounts of unlabeled data to build robust remote identity document verification systems on commodity smartphones. Code is available at https://github.com/EPITAResearchLab/pouliquen.24.icdar
Paper Structure (19 sections, 1 equation, 5 figures, 3 tables)

This paper contains 19 sections, 1 equation, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Proposed approach overview, involving 1) the weakly supervised training with a specific data selection strategy over the trainset; 2) the inference pipeline extracting optimized features used afterward to compute the final "Original/Attack" decision based on a thresholding of pairwise distances. The decision illustrates how the threshold is calibrated over the validation part of the train set.
  • Figure 2: In MIDV Holo dataset, documents are captured in different places involving various backgrounds and lightning conditions (left). Document quads are annotated on all images allowing rectifications (center). Additionally, we propose to define a region of interest containing part of the face and the holograms in charge of securing it (right). Extracted Regions of Interest (ROIs) from sampled labeled as "Originals" (below) contain more or less visible holographic content. Identities (names and faces) are synthetic.
  • Figure 3: Frame sampling strategy for building triplets [A]nchor/[P]ositive/[N]egative from original and attack videos. Each original triplet is sampled from a unique original video (A and P at $t$, N at $t+1$). Each attack triplet is sampled from 2 different videos of a same identity with A and P, both belonging to a common video, and N belonging to a different one. All samples are transformed with uniformly selected augmentations.
  • Figure 4: Integrated Gradients axiomaticattributionSundararajanTY17 visualizes a training sample, emphasizing our method's effectiveness in directing the network's attention towards the hologram. In contrast, the ImageNet-trained model lacks this focused attribution to the hologram, highlighting the significance of our training approach.
  • Figure 5: Proposed split over the MIDV-Holo dataset (64% train, 16% validation and 20% test). MIDV-Holo "Vanilla" refers to the part tackled in the original paper. "Photo replacement" attacks are exclusively used for testing in our experiments.