Embedding Aggregation for Forensic Facial Comparison

Rafael Oliveira Ribeiro; João C. R. Neves; Arnout C. C. Ruifrok; Flavio de Barros Vidal

Embedding Aggregation for Forensic Facial Comparison

Rafael Oliveira Ribeiro, João C. R. Neves, Arnout C. C. Ruifrok, Flavio de Barros Vidal

TL;DR

This work tackles forensic facial comparison under uncontrolled imaging conditions by introducing an embedding-aggregation framework that builds a single descriptor $\mathbf{v}^* = \sum_{i=1}^N w_i \mathbf{v}^{t_i}$ from multiple trace images and evaluates it against a reference using cosine similarity $s$, subsequently mapping to a likelihood ratio $LR$ via $LR = \frac{Pr(s|H_p)}{Pr(s|H_d)}$. It analyzes three weighting strategies—Ser-Fiq Pooling, Confusion Score Pooling, and Average Pooling—and calibrates the results with regularized logistic regression for robust score-to-LR conversion. Empirical results across surveillance (SCface, Quis-Campi) and social-media (Adience, BFW) datasets show substantial reductions in $C_{llr}$, particularly for low-quality images, with additional gains when aggregating more embeddings and when datasets are cleaned. The findings support the practical value of embedding-aggregation for forensic evidence evaluation and point to future directions including data collection at CCTV scales and neural aggregation approaches that incorporate typicality into LR mapping.

Abstract

In forensic facial comparison, questioned-source images are usually captured in uncontrolled environments, with non-uniform lighting, and from non-cooperative subjects. The poor quality of such material usually compromises their value as evidence in legal matters. On the other hand, in forensic casework, multiple images of the person of interest are usually available. In this paper, we propose to aggregate deep neural network embeddings from various images of the same person to improve performance in facial verification. We observe significant performance improvements, especially for very low-quality images. Further improvements are obtained by aggregating embeddings of more images and by applying quality-weighted aggregation. We demonstrate the benefits of this approach in forensic evaluation settings with the development and validation of score-based likelihood ratio systems and report improvements in Cllr of up to 95% (from 0.249 to 0.012) for CCTV images and of up to 96% (from 0.083 to 0.003) for social media images.

Embedding Aggregation for Forensic Facial Comparison

TL;DR

This work tackles forensic facial comparison under uncontrolled imaging conditions by introducing an embedding-aggregation framework that builds a single descriptor

from multiple trace images and evaluates it against a reference using cosine similarity

, subsequently mapping to a likelihood ratio

via

. It analyzes three weighting strategies—Ser-Fiq Pooling, Confusion Score Pooling, and Average Pooling—and calibrates the results with regularized logistic regression for robust score-to-LR conversion. Empirical results across surveillance (SCface, Quis-Campi) and social-media (Adience, BFW) datasets show substantial reductions in

, particularly for low-quality images, with additional gains when aggregating more embeddings and when datasets are cleaned. The findings support the practical value of embedding-aggregation for forensic evidence evaluation and point to future directions including data collection at CCTV scales and neural aggregation approaches that incorporate typicality into LR mapping.

Abstract

Paper Structure (17 sections, 9 equations, 8 figures, 2 tables)

This paper contains 17 sections, 9 equations, 8 figures, 2 tables.

Introduction
Related Work
Proposed Method
Ser-Fiq Pooling
CS Pooling
Average Pooling
Data
Surveillance Datasets
Novel Verification Protocol for the Quis-Campi dataset
Social Media Datasets
Definition of References for Adience and BFW Datasets
Identity Errors in Adience and BFW Datasets
Experiments
Face Recognition Model
Score-to-LR Model
...and 2 more sections

Figures (8)

Figure 1: Comparison of the proposed framework with traditional forensic facial analysis systems.
Figure 2: Examples of references selected for the Adience and BFW datasets. For each identity, the face at the top left (in green) is selected as a reference, and the others are used as traces.
Figure 3: Bi-modal behavior of genuine scores distributions for the Adience (a) and BFW (b) datasets, suggestive of identity labeling errors. After cleaning, the genuine distributions no longer exhibit this bi-modal behavior (c, d).
Figure 4: Examples of identity labeling errors (red boxes) in the Adience and BFW datasets.
Figure 5: Distributions of Confusion Scores for the references and probes from the BFW and Adience datasets, before and after cleaning.
...and 3 more figures

Embedding Aggregation for Forensic Facial Comparison

TL;DR

Abstract

Embedding Aggregation for Forensic Facial Comparison

Authors

TL;DR

Abstract

Table of Contents

Figures (8)