An algorithm for forensic toolmark comparisons

Maria Cuellar; Sheng Gao; Heike Hofmann

An algorithm for forensic toolmark comparisons

Maria Cuellar, Sheng Gao, Heike Hofmann

TL;DR

This work tackles the subjectivity of forensic toolmark analysis by introducing an objective, probabilistic framework built on 3D toolmark data from consecutively manufactured screwdrivers. It combines a data-driven clustering step (PAM) to understand variability by source with density-based separation of Known-Match and Known-Non-Match pairs, and finally derives likelihood ratios via Beta-distributed densities to provide interpretable evidence metrics. The approach achieves high cross-validated performance (e.g., $0.98$ sensitivity and $0.96$ specificity in the primary experiment) and identifies a practical signal-length threshold (approximately $1.5$ mm) below which reliable classification is unlikely, while remaining robust to angle/direction changes within the studied range. The proposed open-source pipeline and datasets enable forensic examiners to produce transparent, LR-based conclusions and pave the way for broader generalization to other tools, contingent on expanded data collection.

Abstract

Forensic toolmark analysis traditionally relies on subjective human judgment, leading to inconsistencies and lack of transparency. The multitude of variables, including angles and directions of mark generation, further complicates comparisons. To address this, we first generate a dataset of 3D toolmarks from various angles and directions using consecutively manufactured slotted screwdrivers. By using PAM clustering, we find that there is clustering by tool rather than angle or direction. Using Known Match and Known Non-Match densities, we establish thresholds for classification. Fitting Beta distributions to the densities, we allow for the derivation of likelihood ratios for new toolmark pairs. With a cross-validated sensitivity of 98% and specificity of 96%, our approach enhances the reliability of toolmark analysis. This approach is applicable to slotted screwdrivers, and for screwdrivers that are made with a similar production method. With data collection of other tools and factors, it could be applied to compare toolmarks of other types. This empirically trained, open-source solution offers forensic examiners a standardized means to objectively compare toolmarks, potentially decreasing the number of miscarriages of justice in the legal system.

An algorithm for forensic toolmark comparisons

TL;DR

sensitivity and

specificity in the primary experiment) and identifies a practical signal-length threshold (approximately

mm) below which reliable classification is unlikely, while remaining robust to angle/direction changes within the studied range. The proposed open-source pipeline and datasets enable forensic examiners to produce transparent, LR-based conclusions and pave the way for broader generalization to other tools, contingent on expanded data collection.

Abstract

Paper Structure (22 sections, 4 equations, 16 figures, 3 tables)

This paper contains 22 sections, 4 equations, 16 figures, 3 tables.

Introduction
Previous work
Data generation
Experimental design
Materials
Signal extraction
Data
Methods
Method 1: Similarity matrices and clustering to studying variability by source, angle, and direction
Method 2: Known-match and known-non-match densities to classify same- and different-source
Method 3: Score-based likelihood ratio to provide probabilistic interpretation
Results
Method 1: Similarity scores and clustering
Method 2: Densities
Method 3: Likelihood ratio
...and 7 more sections

Figures (16)

Figure 1: Screwdriver tip generating a striated toolmark on the substrate material. This toolmark is made at a 50 degree angle of attack and in the "pulling" direction. Image adapted from garcia2017influence.
Figure 2: Materials for generating toolmarks and scanning them in 3D.
Figure 3: Steps to extract the signals from the 3D toolmark scans.
Figure 4: Replicate signals from a single source (small tool 1), at a fixed angle of attack (80) and direction of tool generation (pull). The black signal is the average of the rest.
Figure 5: Averaged replicate signals from a single source (large tool 1), at three angles of attack (60, 70, 80), at a fixed direction (pull). Eight replicates made at each angle were averaged across angle, so only three signals are shown. The black signal is the average of the other three curves. Note that these signals are wider than those of experiments 1 and 2 because they were made with larger screwdrivers.
...and 11 more figures

An algorithm for forensic toolmark comparisons

TL;DR

Abstract

An algorithm for forensic toolmark comparisons

Authors

TL;DR

Abstract

Table of Contents

Figures (16)