IDNet: A Novel Dataset for Identity Document Analysis and Fraud Detection

Hong Guan; Yancheng Wang; Lulu Xie; Soham Nag; Rajeev Goel; Niranjan Erappa Narayana Swamy; Yingzhen Yang; Chaowei Xiao; Jonathan Prisby; Ross Maciejewski; Jia Zou

IDNet: A Novel Dataset for Identity Document Analysis and Fraud Detection

Hong Guan, Yancheng Wang, Lulu Xie, Soham Nag, Rajeev Goel, Niranjan Erappa Narayana Swamy, Yingzhen Yang, Chaowei Xiao, Jonathan Prisby, Ross Maciejewski, Jia Zou

TL;DR

The utility and present use cases of the IDNet dataset are evaluated, illustrating how it can aid in training privacy-preserving fraud detection methods, facilitating the generation of camera and video capturing of identity documents, and testing schema unification and other identity document management functionalities.

Abstract

Effective fraud detection and analysis of government-issued identity documents, such as passports, driver's licenses, and identity cards, are essential in thwarting identity theft and bolstering security on online platforms. The training of accurate fraud detection and analysis tools depends on the availability of extensive identity document datasets. However, current publicly available benchmark datasets for identity document analysis, including MIDV-500, MIDV-2020, and FMIDV, fall short in several respects: they offer a limited number of samples, cover insufficient varieties of fraud patterns, and seldom include alterations in critical personal identifying fields like portrait images, limiting their utility in training models capable of detecting realistic frauds while preserving privacy. In response to these shortcomings, our research introduces a new benchmark dataset, IDNet, designed to advance privacy-preserving fraud detection efforts. The IDNet dataset comprises 837,060 images of synthetically generated identity documents, totaling approximately 490 gigabytes, categorized into 20 types from $10$ U.S. states and 10 European countries. We evaluate the utility and present use cases of the dataset, illustrating how it can aid in training privacy-preserving fraud detection methods, facilitating the generation of camera and video capturing of identity documents, and testing schema unification and other identity document management functionalities.

IDNet: A Novel Dataset for Identity Document Analysis and Fraud Detection

TL;DR

Abstract

U.S. states and 10 European countries. We evaluate the utility and present use cases of the dataset, illustrating how it can aid in training privacy-preserving fraud detection methods, facilitating the generation of camera and video capturing of identity documents, and testing schema unification and other identity document management functionalities.

Paper Structure (32 sections, 16 figures, 20 tables)

This paper contains 32 sections, 16 figures, 20 tables.

Introduction
Background
Identity Documents
A Survey of Existing Public Identity Document Datasets
The IDNet Benchmark Dataset
Generation of the Identity Document Dataset
Template Generation based on Image Diffusion Model
Metadata Information Generation
Add Generated Information to the Identity Document Templates
Fraud Patterns
Time and Monetary Costs
IDNet Quality Evaluation
Metadata Quality
Document Fidelity
Stealthiness of the Generated Fraud Data
...and 17 more sections

Figures (16)

Figure 1: Overview of IDNet: We have 5979 face portrait photos, used to create 5979 distinct samples for each document type. For each such sample, we further create six fraud samples with different fraud patterns.
Figure 2: Overview of Identity Documents -- Taking Arizona Drivers' License Card as an example. Personal identifier information is highlighted in red rectangles, and some example security features are highlighted in green rectangles.
Figure 3: Illustration of the identity document template generation process.
Figure 4: Illustration of the generated identity document templates (We added texts such as "For Research Purposes" to avoid abuse uses).
Figure 5: Illustration of segment-based parameter search
...and 11 more figures

IDNet: A Novel Dataset for Identity Document Analysis and Fraud Detection

TL;DR

Abstract

IDNet: A Novel Dataset for Identity Document Analysis and Fraud Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (16)