Enhancing Binary Encoded Crime Linkage Analysis Using Siamese Network
Yicheng Zhan, Fahim Ahmed, Amy Burrell, Matthew J. Tonkin, Sarah Galambos, Jessica Woodhams, Dalal Alrajeh
TL;DR
The paper tackles crime linkage in high-dimensional, sparse binary data by proposing a Siamese Autoencoder that integrates geographic-temporal information at the decoder stage to amplify pairwise signals. It jointly optimizes a contrastive loss and a reconstruction loss, yielding robust latent embeddings, and augments this with domain-informed data reduction to preserve semantics while reducing dimensionality. Evaluations on the ViCLAS dataset show consistent improvements over baselines, with up to 9% relative AUC gains and substantial reductions in investigative workload during out-of-time tests. The approach offers a practical, interpretable tool for investigators, accompanied by guidelines for data preprocessing, ethical safeguards, and adaptations for cross-jurisdictional deployment.
Abstract
Effective crime linkage analysis is crucial for identifying serial offenders and enhancing public safety. To address limitations of traditional crime linkage methods in handling high-dimensional, sparse, and heterogeneous data, we propose a Siamese Autoencoder framework that learns meaningful latent representations and uncovers correlations in complex crime data. Using data from the Violent Crime Linkage Analysis System (ViCLAS), maintained by the Serious Crime Analysis Section of the UK's National Crime Agency, our approach mitigates signal dilution in sparse feature spaces by integrating geographic-temporal features at the decoder stage. This design amplifies behavioral representations rather than allowing them to be overshadowed at the input level, yielding consistent improvements across multiple evaluation metrics. We further analyze how different domain-informed data reduction strategies influence model performance, providing practical guidance for preprocessing in crime linkage contexts. Our results show that advanced machine learning approaches can substantially enhance linkage accuracy, improving AUC by up to 9% over traditional methods while offering interpretable insights to support investigative decision-making.
