NUMOSIM: A Synthetic Mobility Dataset with Anomaly Detection Benchmarks

Chris Stanford; Suman Adari; Xishun Liao; Yueshuai He; Qinhua Jiang; Chenchen Kuai; Jiaqi Ma; Emmanuel Tung; Yinlong Qian; Lingyi Zhao; Zihao Zhou; Zeeshan Rasheed; Khurram Shafique

NUMOSIM: A Synthetic Mobility Dataset with Anomaly Detection Benchmarks

Chris Stanford, Suman Adari, Xishun Liao, Yueshuai He, Qinhua Jiang, Chenchen Kuai, Jiaqi Ma, Emmanuel Tung, Yinlong Qian, Lingyi Zhao, Zihao Zhou, Zeeshan Rasheed, Khurram Shafique

TL;DR

This work introduces a synthetic mobility dataset, NUMOSIM, that provides a controlled, ethical, and diverse environment for benchmarking anomaly detection techniques and provides open access to the NUMOSIM dataset, along with comprehensive documentation, evaluation metrics, and benchmark results.

Abstract

Collecting real-world mobility data is challenging. It is often fraught with privacy concerns, logistical difficulties, and inherent biases. Moreover, accurately annotating anomalies in large-scale data is nearly impossible, as it demands meticulous effort to distinguish subtle and complex patterns. These challenges significantly impede progress in geospatial anomaly detection research by restricting access to reliable data and complicating the rigorous evaluation, comparison, and benchmarking of methodologies. To address these limitations, we introduce a synthetic mobility dataset, NUMOSIM, that provides a controlled, ethical, and diverse environment for benchmarking anomaly detection techniques. NUMOSIM simulates a wide array of realistic mobility scenarios, encompassing both typical and anomalous behaviours, generated through advanced deep learning models trained on real mobility data. This approach allows NUMOSIM to accurately replicate the complexities of real-world movement patterns while strategically injecting anomalies to challenge and evaluate detection algorithms based on how effectively they capture the interplay between demographic, geospatial, and temporal factors. Our goal is to advance geospatial mobility analysis by offering a realistic benchmark for improving anomaly detection and mobility modeling techniques. To support this, we provide open access to the NUMOSIM dataset, along with comprehensive documentation, evaluation metrics, and benchmark results.

NUMOSIM: A Synthetic Mobility Dataset with Anomaly Detection Benchmarks

TL;DR

Abstract

Paper Structure (28 sections, 1 equation, 5 figures, 6 tables)

This paper contains 28 sections, 1 equation, 5 figures, 6 tables.

Introduction
Prior Work
Location-Based Services (LBS) Data
Synthetic Data
Vehicle Data
Present Work
Methodology
Anomaly Injection
Benchmarks
Ongoing Releases
File Descriptions
Supplemental Files
Stay point files
Conclusion
Baseline Method Adaptations
...and 13 more sections

Figures (5)

Figure 1: Area of interest covered in the simulation. All agents stay within the boundary during the 8 week period. Map data from OpenStreetMap openstreetmap.
Figure 2: Example of a recurring anomaly. Left: the unaltered, normal test period for an agent, represented as a calendar. Each box represents one day, and each row within a box represents a six-hour period (as demonstrated in the top-left box). Right: the altered, anomalous test period for the same agentThe injected anomalies (marked with the star hatch pattern) are a visit to a new location (cyan) for that agent, recurring at the same time of day each time. The surrounding visits are also altered temporally to accommodate and are therefore also considered anomalous.
Figure 3: Example of normal (Left) and anomalous (Right) daily stay point sequence for the same agent as \ref{['fig:recurring']} on the day of Test Week 2, Day 4 (Thursday).
Figure 4: Illustration of the RioBUS CNN architecture adapted to the multi-agent GPS staypoint trajectory. The input to the network is 20 GPS coordinates of a single agent that are coded in three channels, which correspond to the timestamp, latitude, and longitude of each GPS coordinate. The input can be optionally extended to four channels by the additional stay duration feature. This input is fed to two convolutional layer blocks. The CNN output is then concatenated with an optional POI embedding. The last layer predicts the agent ID to which the input 20 GPS coordinates belong.
Figure 5: Illustration of the Point Activity Classifier (of STOD) adapted to the multi-agent GPS staypoint trajectory. The input to the network is 21 GPS coordinates with some extra features, which correspond to the timestamp, latitude, longitude, POI types, and stay duration of each GPS coordinate. The input is split into the left window, the coordinate of interest, and the right window. All other inputs are used to predict the POI type of the POI type of the coordinate of interest.

NUMOSIM: A Synthetic Mobility Dataset with Anomaly Detection Benchmarks

TL;DR

Abstract

NUMOSIM: A Synthetic Mobility Dataset with Anomaly Detection Benchmarks

Authors

TL;DR

Abstract

Table of Contents

Figures (5)