Forensic Activity Classification Using Digital Traces from iPhones: A Machine Learning-based Approach
Conor McCarthy, Jan Peter van Zandwijk, Marcel Worring, Zeno Geradts
TL;DR
This paper develops a machine learning pipeline that converts timestamped iPhone digital traces into likelihood ratios for forensic activity evidence. Using CatBoost for scoring and logistic calibration, it shows that 167 of 171 binary activity pairs are informative, and extends the approach to multiclass classifications and timeline generation. The authors provide extensive evaluation with calibration and discrimination metrics, reveal key predictive variables, and demonstrate practical utility through timelines and semantic groupings. The NFI_FARED dataset and accompanying code are publicly released to facilitate replication and further research in digital-forensic activity analysis.
Abstract
Smartphones and smartwatches are ever-present in daily life, and provide a rich source of information on their users' behaviour. In particular, digital traces derived from the phone's embedded movement sensors present an opportunity for a forensic investigator to gain insight into a person's physical activities. In this work, we present a machine learning-based approach to translate digital traces into likelihood ratios (LRs) for different types of physical activities. Evaluating on a new dataset, NFI\_FARED, which contains digital traces from four different types of iPhones labelled with 19 activities, it was found that our approach could produce useful LR systems to distinguish 167 out of a possible 171 activity pairings. The same approach was extended to analyse likelihoods for multiple activities (or groups of activities) simultaneously and create activity timelines to aid in both the early and latter stages of forensic investigations. The dataset and all code required to replicate the results have also been made public to encourage further research on this topic.
