Temporally Propagated Masks and Bounding Boxes: Combining the Best of Both Worlds for Multi-Object Tracking
Tomasz Stanczyk, Francois Bremond
TL;DR
A novel approach, McByte, is proposed, which incorporates a temporally propagated segmentation mask as a strong association cue within a tracking-by-detection framework, which enhances robustness and generalizability without per-sequence tuning.
Abstract
Multi-object tracking (MOT) involves identifying and consistently tracking objects across video sequences. Traditional tracking-by-detection methods, while effective, often require extensive tuning and lack generalizability. On the other hand, segmentation mask-based methods are more generic but struggle with tracking management, making them unsuitable for MOT. We propose a novel approach, McByte, which incorporates a temporally propagated segmentation mask as a strong association cue within a tracking-by-detection framework. By combining bounding box and propagated mask information, McByte enhances robustness and generalizability without per-sequence tuning. Evaluated on four benchmark datasets - DanceTrack, MOT17, SoccerNet-tracking 2022, and KITTI-tracking - McByte demonstrates performance gain in all cases examined. At the same time, it outperforms existing mask-based methods. Implementation code will be provided upon acceptance.
