Table of Contents
Fetching ...

TeLL Me what you cant see

Saverio Cavasin, Pietro Biasetton, Mattia Tamiazzo, Mauro Conti, Simone Milani

TL;DR

Scarcity and quality limitations of mugshot images impede reliable identification in investigations. The authors propose a forensic mugshot augmentation framework that ensembles multiple generative networks with Vision-Language Models to preserve identity while expanding usable data, and combines image enhancement, linguistic description, and diffusion-based augmentation. A semantic Hamming distance metric and aging synthesis enable evaluation of description fidelity and temporal changes, with experiments showing improved re-identification robustness under several scenarios. While promising for witness consultations and missing-person searches, the approach reveals biases (notably across ethnic groups) and aging-related challenges that require further robustness and real-world validation.

Abstract

During criminal investigations, images of persons of interest directly influence the success of identification procedures. However, law enforcement agencies often face challenges related to the scarcity of high-quality images or their obsolescence, which can affect the accuracy and success of people searching processes. This paper introduces a novel forensic mugshot augmentation framework aimed at addressing these limitations. Our approach enhances the identification probability of individuals by generating additional, high-quality images through customizable data augmentation techniques, while maintaining the biometric integrity and consistency of the original data. Several experimental results show that our method significantly improves identification accuracy and robustness across various forensic scenarios, demonstrating its effectiveness as a trustworthy tool law enforcement applications. Index Terms: Digital Forensics, Person re-identification, Feature extraction, Data augmentation, Visual-Language models.

TeLL Me what you cant see

TL;DR

Scarcity and quality limitations of mugshot images impede reliable identification in investigations. The authors propose a forensic mugshot augmentation framework that ensembles multiple generative networks with Vision-Language Models to preserve identity while expanding usable data, and combines image enhancement, linguistic description, and diffusion-based augmentation. A semantic Hamming distance metric and aging synthesis enable evaluation of description fidelity and temporal changes, with experiments showing improved re-identification robustness under several scenarios. While promising for witness consultations and missing-person searches, the approach reveals biases (notably across ethnic groups) and aging-related challenges that require further robustness and real-world validation.

Abstract

During criminal investigations, images of persons of interest directly influence the success of identification procedures. However, law enforcement agencies often face challenges related to the scarcity of high-quality images or their obsolescence, which can affect the accuracy and success of people searching processes. This paper introduces a novel forensic mugshot augmentation framework aimed at addressing these limitations. Our approach enhances the identification probability of individuals by generating additional, high-quality images through customizable data augmentation techniques, while maintaining the biometric integrity and consistency of the original data. Several experimental results show that our method significantly improves identification accuracy and robustness across various forensic scenarios, demonstrating its effectiveness as a trustworthy tool law enforcement applications. Index Terms: Digital Forensics, Person re-identification, Feature extraction, Data augmentation, Visual-Language models.

Paper Structure

This paper contains 28 sections, 1 equation, 20 figures, 2 tables.

Figures (20)

  • Figure 1: Automatic synthetic mugshot generation.
  • Figure 2: Example of a FBI wanted poster
  • Figure 3: Actual mugshots employed in this work.
  • Figure 4: PhotoMaker structure scheme. In the scheme orange is used for the network's inputs, blue is used to highlight different networks composing the model, green stands for different embeddings and lastly red is used for output.
  • Figure 5: Specific features generation
  • ...and 15 more figures