TeLL Me what you cant see
Saverio Cavasin, Pietro Biasetton, Mattia Tamiazzo, Mauro Conti, Simone Milani
TL;DR
Scarcity and quality limitations of mugshot images impede reliable identification in investigations. The authors propose a forensic mugshot augmentation framework that ensembles multiple generative networks with Vision-Language Models to preserve identity while expanding usable data, and combines image enhancement, linguistic description, and diffusion-based augmentation. A semantic Hamming distance metric and aging synthesis enable evaluation of description fidelity and temporal changes, with experiments showing improved re-identification robustness under several scenarios. While promising for witness consultations and missing-person searches, the approach reveals biases (notably across ethnic groups) and aging-related challenges that require further robustness and real-world validation.
Abstract
During criminal investigations, images of persons of interest directly influence the success of identification procedures. However, law enforcement agencies often face challenges related to the scarcity of high-quality images or their obsolescence, which can affect the accuracy and success of people searching processes. This paper introduces a novel forensic mugshot augmentation framework aimed at addressing these limitations. Our approach enhances the identification probability of individuals by generating additional, high-quality images through customizable data augmentation techniques, while maintaining the biometric integrity and consistency of the original data. Several experimental results show that our method significantly improves identification accuracy and robustness across various forensic scenarios, demonstrating its effectiveness as a trustworthy tool law enforcement applications. Index Terms: Digital Forensics, Person re-identification, Feature extraction, Data augmentation, Visual-Language models.
