GDTM: An Indoor Geospatial Tracking Dataset with Distributed Multimodal Sensors
Ho Lyun Jeong, Ziqi Wang, Colin Samplawski, Jason Wu, Shiwei Fang, Lance M. Kaplan, Deepak Ganesan, Benjamin Marlin, Mani Srivastava
TL;DR
This work introduces GDTM, a nine-hour indoor geospatial tracking dataset collected with distributed multimodal sensor nodes and reconfigurable viewpoints to enable robust multimodal fusion research under challenging illumination and deployment variations. It provides synchronized streams from vision, depth, radar, and audio modalities, with mm-level OptiTrack ground truth and extensive viewpoint diversity to study domain shift due to sensor placement. The authors present two baselines—fusion-architecture comparisons and cross-view robustness with local-to-global coordinate transformations—demonstrating that multimodal fusion improves tracking under adverse conditions and that viewpoint variability can be mitigated with depth, radar, and audio signals. The dataset and baselines aim to advance deployable indoor tracking systems for smart buildings, with publicly available code and data to spur further methodological development.
Abstract
Constantly locating moving objects, i.e., geospatial tracking, is essential for autonomous building infrastructure. Accurate and robust geospatial tracking often leverages multimodal sensor fusion algorithms, which require large datasets with time-aligned, synchronized data from various sensor types. However, such datasets are not readily available. Hence, we propose GDTM, a nine-hour dataset for multimodal object tracking with distributed multimodal sensors and reconfigurable sensor node placements. Our dataset enables the exploration of several research problems, such as optimizing architectures for processing multimodal data, and investigating models' robustness to adverse sensing conditions and sensor placement variances. A GitHub repository containing the code, sample data, and checkpoints of this work is available at https://github.com/nesl/GDTM.
