Embedding Aggregation for Forensic Facial Comparison
Rafael Oliveira Ribeiro, João C. R. Neves, Arnout C. C. Ruifrok, Flavio de Barros Vidal
TL;DR
This work tackles forensic facial comparison under uncontrolled imaging conditions by introducing an embedding-aggregation framework that builds a single descriptor $\mathbf{v}^* = \sum_{i=1}^N w_i \mathbf{v}^{t_i}$ from multiple trace images and evaluates it against a reference using cosine similarity $s$, subsequently mapping to a likelihood ratio $LR$ via $LR = \frac{Pr(s|H_p)}{Pr(s|H_d)}$. It analyzes three weighting strategies—Ser-Fiq Pooling, Confusion Score Pooling, and Average Pooling—and calibrates the results with regularized logistic regression for robust score-to-LR conversion. Empirical results across surveillance (SCface, Quis-Campi) and social-media (Adience, BFW) datasets show substantial reductions in $C_{llr}$, particularly for low-quality images, with additional gains when aggregating more embeddings and when datasets are cleaned. The findings support the practical value of embedding-aggregation for forensic evidence evaluation and point to future directions including data collection at CCTV scales and neural aggregation approaches that incorporate typicality into LR mapping.
Abstract
In forensic facial comparison, questioned-source images are usually captured in uncontrolled environments, with non-uniform lighting, and from non-cooperative subjects. The poor quality of such material usually compromises their value as evidence in legal matters. On the other hand, in forensic casework, multiple images of the person of interest are usually available. In this paper, we propose to aggregate deep neural network embeddings from various images of the same person to improve performance in facial verification. We observe significant performance improvements, especially for very low-quality images. Further improvements are obtained by aggregating embeddings of more images and by applying quality-weighted aggregation. We demonstrate the benefits of this approach in forensic evaluation settings with the development and validation of score-based likelihood ratio systems and report improvements in Cllr of up to 95% (from 0.249 to 0.012) for CCTV images and of up to 96% (from 0.083 to 0.003) for social media images.
