TOSS: Real-time Tracking and Moving Object Segmentation for Static Scene Mapping
Seoyeon Jang, Minho Oh, Byeongho Yu, I Made Aswin Nahrendra, Seungjae Lee, Hyungtae Lim, Hyun Myung
TL;DR
The paper addresses the need for real-time autonomous navigation by jointly handling moving-object segmentation (MOS) and static map building in dynamic environments. It introduces TOSS, a real-time MOS framework that fuses online object tracking with static-map construction, featuring a hierarchical association cost matrix that reduces data-association complexity from $O(N^2)$ to $O(kN)$ and a DS-Voting refinement that leverages spatio-temporal cues to improve dynamic/static labeling. The approach is validated on SemanticKITTI and challenging real-world datasets, showing superior Preservation Rate (PR), competitive Rejection Rate (RR), and robust performance under pose inaccuracies, with real-time operation demonstrated via reduced runtimes compared to exhaustive methods. Overall, TOSS enables robust, real-time MOS and static map creation in unstructured environments, supporting safer navigation and higher-quality maps for legged-robot platforms.
Abstract
Safe navigation with simultaneous localization and mapping (SLAM) for autonomous robots is crucial in challenging environments. To achieve this goal, detecting moving objects in the surroundings and building a static map are essential. However, existing moving object segmentation methods have been developed separately for each field, making it challenging to perform real-time navigation and precise static map building simultaneously. In this paper, we propose an integrated real-time framework that combines online tracking-based moving object segmentation with static map building. For safe navigation, we introduce a computationally efficient hierarchical association cost matrix to enable real-time moving object segmentation. In the context of precise static mapping, we present a voting-based method, DS-Voting, designed to achieve accurate dynamic object removal and static object recovery by emphasizing their spatio-temporal differences. We evaluate our proposed method quantitatively and qualitatively in the SemanticKITTI dataset and real-world challenging environments. The results demonstrate that dynamic objects can be clearly distinguished and incorporated into static map construction, even in stairs, steep hills, and dense vegetation.
