Table of Contents
Fetching ...

GSLAMOT: A Tracklet and Query Graph-based Simultaneous Locating, Mapping, and Multiple Object Tracking System

Shuo Wang, Yongcai Wang, Zhimin Xu, Yongyu Guo, Wanting Li, Zhe Huang, Xuewei Bai, Deying Li

TL;DR

GSLAMOT presents a unified framework for simultaneous localization, mapping, and 3D multi-object tracking in dynamic environments by integrating a Tracklet Graph (TG) and a Query Graph (QG). The core contributions are the Multi-criteria Star Graph Association (MSGA) for robust data association and Object-centric Graph Optimization (OGO) with two sliding windows (OCOW and OEFW) to fuse ego-motion, map points, and object trajectories in real time. Experiments on KITTI, Waymo, and a synthetic Traffic Congestion dataset demonstrate improved MOT accuracy and SLAM robustness under unknown ego-motion and congested scenes, with real-time performance via multi-threading. The work provides a principled approach to dynamic-scene perception and offers a new dataset (TCD) to benchmark highly dynamic scenarios. Overall, GSLAMOT advances joint perception capabilities for autonomous navigation in challenging environments.

Abstract

For interacting with mobile objects in unfamiliar environments, simultaneously locating, mapping, and tracking the 3D poses of multiple objects are crucially required. This paper proposes a Tracklet Graph and Query Graph-based framework, i.e., GSLAMOT, to address this challenge. GSLAMOT utilizes camera and LiDAR multimodal information as inputs and divides the representation of the dynamic scene into a semantic map for representing the static environment, a trajectory of the ego-agent, and an online maintained Tracklet Graph (TG) for tracking and predicting the 3D poses of the detected mobile objects. A Query Graph (QG) is constructed in each frame by object detection to query and update TG. For accurate object association, a Multi-criteria Star Graph Association (MSGA) method is proposed to find matched objects between the detections in QG and the predicted tracklets in TG. Then, an Object-centric Graph Optimization (OGO) method is proposed to simultaneously optimize the TG, the semantic map, and the agent trajectory. It triangulates the detected objects into the map to enrich the map's semantic information. We address the efficiency issues to handle the three tightly coupled tasks in parallel. Experiments are conducted on KITTI, Waymo, and an emulated Traffic Congestion dataset that highlights challenging scenarios. Experiments show that GSLAMOT enables accurate crowded object tracking while conducting SLAM accurately in challenging scenarios, demonstrating more excellent performances than the state-of-the-art methods. The code and dataset are at https://gslamot.github.io.

GSLAMOT: A Tracklet and Query Graph-based Simultaneous Locating, Mapping, and Multiple Object Tracking System

TL;DR

GSLAMOT presents a unified framework for simultaneous localization, mapping, and 3D multi-object tracking in dynamic environments by integrating a Tracklet Graph (TG) and a Query Graph (QG). The core contributions are the Multi-criteria Star Graph Association (MSGA) for robust data association and Object-centric Graph Optimization (OGO) with two sliding windows (OCOW and OEFW) to fuse ego-motion, map points, and object trajectories in real time. Experiments on KITTI, Waymo, and a synthetic Traffic Congestion dataset demonstrate improved MOT accuracy and SLAM robustness under unknown ego-motion and congested scenes, with real-time performance via multi-threading. The work provides a principled approach to dynamic-scene perception and offers a new dataset (TCD) to benchmark highly dynamic scenarios. Overall, GSLAMOT advances joint perception capabilities for autonomous navigation in challenging environments.

Abstract

For interacting with mobile objects in unfamiliar environments, simultaneously locating, mapping, and tracking the 3D poses of multiple objects are crucially required. This paper proposes a Tracklet Graph and Query Graph-based framework, i.e., GSLAMOT, to address this challenge. GSLAMOT utilizes camera and LiDAR multimodal information as inputs and divides the representation of the dynamic scene into a semantic map for representing the static environment, a trajectory of the ego-agent, and an online maintained Tracklet Graph (TG) for tracking and predicting the 3D poses of the detected mobile objects. A Query Graph (QG) is constructed in each frame by object detection to query and update TG. For accurate object association, a Multi-criteria Star Graph Association (MSGA) method is proposed to find matched objects between the detections in QG and the predicted tracklets in TG. Then, an Object-centric Graph Optimization (OGO) method is proposed to simultaneously optimize the TG, the semantic map, and the agent trajectory. It triangulates the detected objects into the map to enrich the map's semantic information. We address the efficiency issues to handle the three tightly coupled tasks in parallel. Experiments are conducted on KITTI, Waymo, and an emulated Traffic Congestion dataset that highlights challenging scenarios. Experiments show that GSLAMOT enables accurate crowded object tracking while conducting SLAM accurately in challenging scenarios, demonstrating more excellent performances than the state-of-the-art methods. The code and dataset are at https://gslamot.github.io.
Paper Structure (30 sections, 14 equations, 11 figures, 9 tables)

This paper contains 30 sections, 14 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: Our system processes LiDAR point clouds and stereo images as inputs. The 3D detection algorithm extracts detection boxes from the point cloud, while the visual odometry front-end obtains the initial ego-motion pose. In the world frame, we construct query and tracklet graphs for detections and tracklets, respectively, and use the MSGA algorithm for association and tracking. Ego-motion, map points, and tracklets are optimized in OGO. Tracklet states for the next frame are estimated using a motion model and participate in the subsequent tracking.
  • Figure 2: Multi-criteria Star Graph Association.
  • Figure 3: Object-centric Graph Optimization (OGO). Residual edges in solid lines participate in graph optimization, while those in dashed lines do not.
  • Figure 4: The residual curves of our proposed object-centric optimization and classic ego-centric optimization. The residual refers to the summary of all mapping-based errors and object detection errors in Equation (\ref{['equ:tightly']}).
  • Figure 5: The system architecture and parallel threads.
  • ...and 6 more figures