An algorithm for forensic toolmark comparisons
Maria Cuellar, Sheng Gao, Heike Hofmann
TL;DR
This work tackles the subjectivity of forensic toolmark analysis by introducing an objective, probabilistic framework built on 3D toolmark data from consecutively manufactured screwdrivers. It combines a data-driven clustering step (PAM) to understand variability by source with density-based separation of Known-Match and Known-Non-Match pairs, and finally derives likelihood ratios via Beta-distributed densities to provide interpretable evidence metrics. The approach achieves high cross-validated performance (e.g., $0.98$ sensitivity and $0.96$ specificity in the primary experiment) and identifies a practical signal-length threshold (approximately $1.5$ mm) below which reliable classification is unlikely, while remaining robust to angle/direction changes within the studied range. The proposed open-source pipeline and datasets enable forensic examiners to produce transparent, LR-based conclusions and pave the way for broader generalization to other tools, contingent on expanded data collection.
Abstract
Forensic toolmark analysis traditionally relies on subjective human judgment, leading to inconsistencies and lack of transparency. The multitude of variables, including angles and directions of mark generation, further complicates comparisons. To address this, we first generate a dataset of 3D toolmarks from various angles and directions using consecutively manufactured slotted screwdrivers. By using PAM clustering, we find that there is clustering by tool rather than angle or direction. Using Known Match and Known Non-Match densities, we establish thresholds for classification. Fitting Beta distributions to the densities, we allow for the derivation of likelihood ratios for new toolmark pairs. With a cross-validated sensitivity of 98% and specificity of 96%, our approach enhances the reliability of toolmark analysis. This approach is applicable to slotted screwdrivers, and for screwdrivers that are made with a similar production method. With data collection of other tools and factors, it could be applied to compare toolmarks of other types. This empirically trained, open-source solution offers forensic examiners a standardized means to objectively compare toolmarks, potentially decreasing the number of miscarriages of justice in the legal system.
