Table of Contents
Fetching ...

CLIP4Sketch: Enhancing Sketch to Mugshot Matching through Dataset Augmentation using Diffusion Models

Kushal Kumar Jain, Steve Grosz, Anoop M. Namboodiri, Anil K. Jain

TL;DR

This work proposes CLIP4Sketch, a novel approach that leverages diffusion models to generate a large and diverse set of sketch images, which helps in enhancing the performance of face recognition systems in sketch-to-mugshot matching.

Abstract

Forensic sketch-to-mugshot matching is a challenging task in face recognition, primarily hindered by the scarcity of annotated forensic sketches and the modality gap between sketches and photographs. To address this, we propose CLIP4Sketch, a novel approach that leverages diffusion models to generate a large and diverse set of sketch images, which helps in enhancing the performance of face recognition systems in sketch-to-mugshot matching. Our method utilizes Denoising Diffusion Probabilistic Models (DDPMs) to generate sketches with explicit control over identity and style. We combine CLIP and Adaface embeddings of a reference mugshot, along with textual descriptions of style, as the conditions to the diffusion model. We demonstrate the efficacy of our approach by generating a comprehensive dataset of sketches corresponding to mugshots and training a face recognition model on our synthetic data. Our results show significant improvements in sketch-to-mugshot matching accuracy over training on an existing, limited amount of real face sketch data, validating the potential of diffusion models in enhancing the performance of face recognition systems across modalities. We also compare our dataset with datasets generated using GAN-based methods to show its superiority.

CLIP4Sketch: Enhancing Sketch to Mugshot Matching through Dataset Augmentation using Diffusion Models

TL;DR

This work proposes CLIP4Sketch, a novel approach that leverages diffusion models to generate a large and diverse set of sketch images, which helps in enhancing the performance of face recognition systems in sketch-to-mugshot matching.

Abstract

Forensic sketch-to-mugshot matching is a challenging task in face recognition, primarily hindered by the scarcity of annotated forensic sketches and the modality gap between sketches and photographs. To address this, we propose CLIP4Sketch, a novel approach that leverages diffusion models to generate a large and diverse set of sketch images, which helps in enhancing the performance of face recognition systems in sketch-to-mugshot matching. Our method utilizes Denoising Diffusion Probabilistic Models (DDPMs) to generate sketches with explicit control over identity and style. We combine CLIP and Adaface embeddings of a reference mugshot, along with textual descriptions of style, as the conditions to the diffusion model. We demonstrate the efficacy of our approach by generating a comprehensive dataset of sketches corresponding to mugshots and training a face recognition model on our synthetic data. Our results show significant improvements in sketch-to-mugshot matching accuracy over training on an existing, limited amount of real face sketch data, validating the potential of diffusion models in enhancing the performance of face recognition systems across modalities. We also compare our dataset with datasets generated using GAN-based methods to show its superiority.
Paper Structure (15 sections, 2 equations, 6 figures, 3 tables)

This paper contains 15 sections, 2 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Qualitative comparison of our CLIP4Sketch generated sketches and real sketches. The second row is generated by our method, while the first row contains examples from existing datasets PRIP-Composites klum and CUHK4624272. The text prompts used to generate the images in the second row using the proposed CLIP4Sketch model are, in order from left to right, “a viewed software-generated sketch of a face”, “a software-generated sketch of a face”, “a viewed hand-drawn sketch of a face”, and “a hand-drawn sketch of a face”. The leftmost image is the input identity.
  • Figure 2: The CLIP4Sketch pipeline for generating diverse sketches from a mugshot image. A latent diffusion model is employed which combines embeddings from CLIP and AdaFace to preserve identity and uses text prompts to control stylistic variations. We use a canny edge image as controlnet condition along with decoupled cross-attention layers for identity and style conditioning. The output images shown have the following similarity scores 0.2582, 0.2773, 0.2518, 0.2558.
  • Figure 3: First, T-SNE plot shows modality gap in sketch images from mugshots and the similarity of our generated sketches and real sketches. The first histogram shows genuine and imposter score distributions for face-sketch pairs in real sketch datasets like CUHK 4624272, PRIP-Composites klum and the second histogram shows the distribution for the dataset generated using our proposed CLIP4Sketch.
  • Figure 4: ROC plots that show how performance drops for face to face matching because of using more sketch data, while an upwards trend is visible for face to sketch matching as we use more data.
  • Figure 5: DET plot to show open set performance of Adaface when trained on our dataset, compared against other GAN generated datasets. This shows the potential of diffusion models in generating synthetic datasets.
  • ...and 1 more figures