Table of Contents
Fetching ...

CriSp: Leveraging Tread Depth Maps for Enhanced Crime-Scene Shoeprint Matching

Samia Shafique, Shu Kong, Charless Fowlkes

TL;DR

CriSp addresses forensic shoeprint matching by reframing the problem as cross-domain retrieval against tread-depth maps predicted from online tread images. The method combines a data augmentation module, a spatial encoder, and a masking mechanism to learn region-consistent representations, trained with supervised contrastive learning. Leveraging large-scale online tread data and two crime-scene benchmarks, CriSp achieves state-of-the-art performance on automated shoeprint matching and cross-domain image retrieval, while providing scalable inference via precomputed database features. The work advances forensic retrieval by enabling depth-map–based matching with partial visibility handling and region-aware localization, though it also discusses ethical considerations and limitations for real-world deployment.

Abstract

Shoeprints are a common type of evidence found at crime scenes and are used regularly in forensic investigations. However, existing methods cannot effectively employ deep learning techniques to match noisy and occluded crime-scene shoeprints to a shoe database due to a lack of training data. Moreover, all existing methods match crime-scene shoeprints to clean reference prints, yet our analysis shows matching to more informative tread depth maps yields better retrieval results. The matching task is further complicated by the necessity to identify similarities only in corresponding regions (heels, toes, etc) of prints and shoe treads. To overcome these challenges, we leverage shoe tread images from online retailers and utilize an off-the-shelf predictor to estimate depth maps and clean prints. Our method, named CriSp, matches crime-scene shoeprints to tread depth maps by training on this data. CriSp incorporates data augmentation to simulate crime-scene shoeprints, an encoder to learn spatially-aware features, and a masking module to ensure only visible regions of crime-scene prints affect retrieval results. To validate our approach, we introduce two validation sets by reprocessing existing datasets of crime-scene shoeprints and establish a benchmarking protocol for comparison. On this benchmark, CriSp significantly outperforms state-of-the-art methods in both automated shoeprint matching and image retrieval tailored to this task.

CriSp: Leveraging Tread Depth Maps for Enhanced Crime-Scene Shoeprint Matching

TL;DR

CriSp addresses forensic shoeprint matching by reframing the problem as cross-domain retrieval against tread-depth maps predicted from online tread images. The method combines a data augmentation module, a spatial encoder, and a masking mechanism to learn region-consistent representations, trained with supervised contrastive learning. Leveraging large-scale online tread data and two crime-scene benchmarks, CriSp achieves state-of-the-art performance on automated shoeprint matching and cross-domain image retrieval, while providing scalable inference via precomputed database features. The work advances forensic retrieval by enabling depth-map–based matching with partial visibility handling and region-aware localization, though it also discusses ethical considerations and limitations for real-world deployment.

Abstract

Shoeprints are a common type of evidence found at crime scenes and are used regularly in forensic investigations. However, existing methods cannot effectively employ deep learning techniques to match noisy and occluded crime-scene shoeprints to a shoe database due to a lack of training data. Moreover, all existing methods match crime-scene shoeprints to clean reference prints, yet our analysis shows matching to more informative tread depth maps yields better retrieval results. The matching task is further complicated by the necessity to identify similarities only in corresponding regions (heels, toes, etc) of prints and shoe treads. To overcome these challenges, we leverage shoe tread images from online retailers and utilize an off-the-shelf predictor to estimate depth maps and clean prints. Our method, named CriSp, matches crime-scene shoeprints to tread depth maps by training on this data. CriSp incorporates data augmentation to simulate crime-scene shoeprints, an encoder to learn spatially-aware features, and a masking module to ensure only visible regions of crime-scene prints affect retrieval results. To validate our approach, we introduce two validation sets by reprocessing existing datasets of crime-scene shoeprints and establish a benchmarking protocol for comparison. On this benchmark, CriSp significantly outperforms state-of-the-art methods in both automated shoeprint matching and image retrieval tailored to this task.
Paper Structure (26 sections, 3 equations, 11 figures, 8 tables)

This paper contains 26 sections, 3 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Our method CriSp compares crime-scene shoeprints against a database of tread depth maps (predicted from tread images available at online retailers) and retrieves a ranked list of matches. We train CriSp using tread depth maps and clean prints (\ref{['sec:datasets']}). We use a data augmentation module $Aug$ to address the domain gap between clean and crime-scene prints, and a spatial feature masking strategy (via spatial encoder $Enc$ and masking module $M$) to match shoeprint patterns to corresponding locations on tread depth maps (\ref{['sec:methodology']}). CriSp significantly outperforms previous methods (\ref{['sec:experiments']}).
  • Figure 2: Examples from train-set. We create training data from online retailers and prepare their annotations by predicting their depth maps and prints shafique2022shoerinsics, although the depth and print predictions are sometimes inaccurate (2nd and 3rd shoe).
  • Figure 3: Dataset statistics. We have a reference database (ref-db) and two validation sets (val-FID and val-ShoeCase) with crime-scene impressions to query against ref-db. We use a section of ref-db for training (train-set) and leave the rest to study generalization. Ground-truth labels from our validation sets connect our query crime-scene shoeprints to shoes in ref-db. See details in \ref{['sec:datasets']} and visual examples in \ref{['fig:training_dataset']} and \ref{['fig:testing_dataset']}.
  • Figure 4: Examples from val-FID and val-ShoeCase. Val-FID contains real crime-scene prints (FID-crime) and clean, fully visible lab impressions (FID-clean). We show FID-crime and FID-clean shoeprints corresponding to the same shoe models for easier comparison. Note that we show a yellow shoe outline on the FID-crime prints for visualization purposes and the outline does not exist in FID-crime images. Val-ShoeCase contains simulated crime-scene shoeprints on blood (ShoeCase-blood) and dust (ShoeCase-dust). All val-ShoeCase prints are full-sized, as opposed to val-FID.
  • Figure 5: Examples of data augmentation. Our data augmentation module $Aug$ simulates crime-scene shoeprints (cf. \ref{['fig:testing_dataset']}) from clean, fully visible prints in our training set (cf. \ref{['fig:training_dataset']}). $Aug$ optionally (1) introduces occlusion such as overlapping prints and random shapes, (2) erases parts of the print to create a grainy appearance, and (3) adds noise to mimic background clutter.
  • ...and 6 more figures