Table of Contents
Fetching ...

GDTM: An Indoor Geospatial Tracking Dataset with Distributed Multimodal Sensors

Ho Lyun Jeong, Ziqi Wang, Colin Samplawski, Jason Wu, Shiwei Fang, Lance M. Kaplan, Deepak Ganesan, Benjamin Marlin, Mani Srivastava

TL;DR

This work introduces GDTM, a nine-hour indoor geospatial tracking dataset collected with distributed multimodal sensor nodes and reconfigurable viewpoints to enable robust multimodal fusion research under challenging illumination and deployment variations. It provides synchronized streams from vision, depth, radar, and audio modalities, with mm-level OptiTrack ground truth and extensive viewpoint diversity to study domain shift due to sensor placement. The authors present two baselines—fusion-architecture comparisons and cross-view robustness with local-to-global coordinate transformations—demonstrating that multimodal fusion improves tracking under adverse conditions and that viewpoint variability can be mitigated with depth, radar, and audio signals. The dataset and baselines aim to advance deployable indoor tracking systems for smart buildings, with publicly available code and data to spur further methodological development.

Abstract

Constantly locating moving objects, i.e., geospatial tracking, is essential for autonomous building infrastructure. Accurate and robust geospatial tracking often leverages multimodal sensor fusion algorithms, which require large datasets with time-aligned, synchronized data from various sensor types. However, such datasets are not readily available. Hence, we propose GDTM, a nine-hour dataset for multimodal object tracking with distributed multimodal sensors and reconfigurable sensor node placements. Our dataset enables the exploration of several research problems, such as optimizing architectures for processing multimodal data, and investigating models' robustness to adverse sensing conditions and sensor placement variances. A GitHub repository containing the code, sample data, and checkpoints of this work is available at https://github.com/nesl/GDTM.

GDTM: An Indoor Geospatial Tracking Dataset with Distributed Multimodal Sensors

TL;DR

This work introduces GDTM, a nine-hour indoor geospatial tracking dataset collected with distributed multimodal sensor nodes and reconfigurable viewpoints to enable robust multimodal fusion research under challenging illumination and deployment variations. It provides synchronized streams from vision, depth, radar, and audio modalities, with mm-level OptiTrack ground truth and extensive viewpoint diversity to study domain shift due to sensor placement. The authors present two baselines—fusion-architecture comparisons and cross-view robustness with local-to-global coordinate transformations—demonstrating that multimodal fusion improves tracking under adverse conditions and that viewpoint variability can be mitigated with depth, radar, and audio signals. The dataset and baselines aim to advance deployable indoor tracking systems for smart buildings, with publicly available code and data to spur further methodological development.

Abstract

Constantly locating moving objects, i.e., geospatial tracking, is essential for autonomous building infrastructure. Accurate and robust geospatial tracking often leverages multimodal sensor fusion algorithms, which require large datasets with time-aligned, synchronized data from various sensor types. However, such datasets are not readily available. Hence, we propose GDTM, a nine-hour dataset for multimodal object tracking with distributed multimodal sensors and reconfigurable sensor node placements. Our dataset enables the exploration of several research problems, such as optimizing architectures for processing multimodal data, and investigating models' robustness to adverse sensing conditions and sensor placement variances. A GitHub repository containing the code, sample data, and checkpoints of this work is available at https://github.com/nesl/GDTM.
Paper Structure (12 sections, 1 equation, 7 figures, 4 tables)

This paper contains 12 sections, 1 equation, 7 figures, 4 tables.

Figures (7)

  • Figure 1: One of our multimodal sensor data collection nodes.
  • Figure 2: Sample data collected by one of the 3 sensing nodes.
  • Figure 3: Data collection session setup: (a) racetrack dimensions (b) experiment site (c) an exemplar circular trajectory (d) an exemplar random trajectory.
  • Figure 4: Multimodal sensor fusion architecture.
  • Figure 5: Average tracking error of the four model variants under different lighting conditions.
  • ...and 2 more figures