Table of Contents
Fetching ...

Forensic Activity Classification Using Digital Traces from iPhones: A Machine Learning-based Approach

Conor McCarthy, Jan Peter van Zandwijk, Marcel Worring, Zeno Geradts

TL;DR

This paper develops a machine learning pipeline that converts timestamped iPhone digital traces into likelihood ratios for forensic activity evidence. Using CatBoost for scoring and logistic calibration, it shows that 167 of 171 binary activity pairs are informative, and extends the approach to multiclass classifications and timeline generation. The authors provide extensive evaluation with calibration and discrimination metrics, reveal key predictive variables, and demonstrate practical utility through timelines and semantic groupings. The NFI_FARED dataset and accompanying code are publicly released to facilitate replication and further research in digital-forensic activity analysis.

Abstract

Smartphones and smartwatches are ever-present in daily life, and provide a rich source of information on their users' behaviour. In particular, digital traces derived from the phone's embedded movement sensors present an opportunity for a forensic investigator to gain insight into a person's physical activities. In this work, we present a machine learning-based approach to translate digital traces into likelihood ratios (LRs) for different types of physical activities. Evaluating on a new dataset, NFI\_FARED, which contains digital traces from four different types of iPhones labelled with 19 activities, it was found that our approach could produce useful LR systems to distinguish 167 out of a possible 171 activity pairings. The same approach was extended to analyse likelihoods for multiple activities (or groups of activities) simultaneously and create activity timelines to aid in both the early and latter stages of forensic investigations. The dataset and all code required to replicate the results have also been made public to encourage further research on this topic.

Forensic Activity Classification Using Digital Traces from iPhones: A Machine Learning-based Approach

TL;DR

This paper develops a machine learning pipeline that converts timestamped iPhone digital traces into likelihood ratios for forensic activity evidence. Using CatBoost for scoring and logistic calibration, it shows that 167 of 171 binary activity pairs are informative, and extends the approach to multiclass classifications and timeline generation. The authors provide extensive evaluation with calibration and discrimination metrics, reveal key predictive variables, and demonstrate practical utility through timelines and semantic groupings. The NFI_FARED dataset and accompanying code are publicly released to facilitate replication and further research in digital-forensic activity analysis.

Abstract

Smartphones and smartwatches are ever-present in daily life, and provide a rich source of information on their users' behaviour. In particular, digital traces derived from the phone's embedded movement sensors present an opportunity for a forensic investigator to gain insight into a person's physical activities. In this work, we present a machine learning-based approach to translate digital traces into likelihood ratios (LRs) for different types of physical activities. Evaluating on a new dataset, NFI\_FARED, which contains digital traces from four different types of iPhones labelled with 19 activities, it was found that our approach could produce useful LR systems to distinguish 167 out of a possible 171 activity pairings. The same approach was extended to analyse likelihoods for multiple activities (or groups of activities) simultaneously and create activity timelines to aid in both the early and latter stages of forensic investigations. The dataset and all code required to replicate the results have also been made public to encourage further research on this topic.

Paper Structure

This paper contains 26 sections, 4 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Overview of method employed to train and evaluate an LR system using the proposed approach. a) Subjects are split between training (green) and validation (purple) to prevent data leakage. b) Data from training subjects is collected into the training set, containing the activity classes of interest, which is used to train the scorer (CatBoostprokhorenkova_catboost_2018). After training, the scores for the entire training set are calculated to produce the training scores distribution, on which we fit the calibrator. The combination of scorer + calibrator is the LR system c) Validation data is taken from the validation subjects and fed through the LR system from b. The resulting likelihood ratios can then be evaluated using $C_{llr}$, PAV plots, Tippet plots, etc.
  • Figure 2: $C_{llr}$ and $C_{llr}^{min}$ for LR systems produced from each combination of two activity classes. Darker colours indicate lower (better) values. Each LR system is produced with $H_1$=row activity, $H_2$=column activity. Diagonal values in green are the mean $C_{llr}$ for an activity across all LR systems in which it is included.
  • Figure 3: Further analysis of the LR systems with the lowest (left column: (running, tram)) and highest (right column: (train, tram)) $C_{llr}$ values.
  • Figure 4: Heatmap displaying average variable importances for each activity class.
  • Figure 5: UpSet plotlex_upset_2014 of $\hat{C}_{mxe}$ for all unique combinations of activity groupings.
  • ...and 2 more figures