Table of Contents
Fetching ...

Benchmarking SAM2-based Trackers on FMOX

Senem Aktas, Charles Markham, John McDonald, Rozenn Dahyot

TL;DR

The study benchmarks SAM2-based trackers on fast-moving object datasets (FMOX) to reveal limitations in current methods. It compares SAM2, DAM4SAM, SAMURAI, and EfficientTAM using mIoU and mDice across 46 FMOX sequences, with initialization from ground-truth bounding boxes. The results show DAM4SAM and SAMURAI consistently outperform SAM2 and EfficientTAM on challenging sequences, while EfficientTAM offers faster compute with a trade-off in accuracy. The findings highlight the importance of memory management and motion-aware strategies for FMOs and provide guidance for deploying SAM2-based trackers in real-time, high-speed contexts.

Abstract

Several object tracking pipelines extending Segment Anything Model 2 (SAM2) have been proposed in the past year, where the approach is to follow and segment the object from a single exemplar template provided by the user on a initialization frame. We propose to benchmark these high performing trackers (SAM2, EfficientTAM, DAM4SAM and SAMURAI) on datasets containing fast moving objects (FMO) specifically designed to be challenging for tracking approaches. The goal is to understand better current limitations in state-of-the-art trackers by providing more detailed insights on the behavior of these trackers. We show that overall the trackers DAM4SAM and SAMURAI perform well on more challenging sequences.

Benchmarking SAM2-based Trackers on FMOX

TL;DR

The study benchmarks SAM2-based trackers on fast-moving object datasets (FMOX) to reveal limitations in current methods. It compares SAM2, DAM4SAM, SAMURAI, and EfficientTAM using mIoU and mDice across 46 FMOX sequences, with initialization from ground-truth bounding boxes. The results show DAM4SAM and SAMURAI consistently outperform SAM2 and EfficientTAM on challenging sequences, while EfficientTAM offers faster compute with a trade-off in accuracy. The findings highlight the importance of memory management and motion-aware strategies for FMOs and provide guidance for deploying SAM2-based trackers in real-time, high-speed contexts.

Abstract

Several object tracking pipelines extending Segment Anything Model 2 (SAM2) have been proposed in the past year, where the approach is to follow and segment the object from a single exemplar template provided by the user on a initialization frame. We propose to benchmark these high performing trackers (SAM2, EfficientTAM, DAM4SAM and SAMURAI) on datasets containing fast moving objects (FMO) specifically designed to be challenging for tracking approaches. The goal is to understand better current limitations in state-of-the-art trackers by providing more detailed insights on the behavior of these trackers. We show that overall the trackers DAM4SAM and SAMURAI perform well on more challenging sequences.

Paper Structure

This paper contains 20 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: FMOX regroups 4 datasets, with a total 46 sequences used for our benchmark. Using ground truth information, object size is divided into 5 categories (0 for Extremely tiny up to 4 for largeAktas2025) with a mean object size is computed for each sequence (reported on the $x$-axis). To capture displacement between two successive frames, the mean IoU between two successive ground truth bounding boxes is also computed for each sequence (reported on the $y$-axis - BB represents bounding boxes). In contrast to Falling Object and TbD-3D, both FMOv2 and TbD datasets are more challenging having smaller objects with smaller overlapping successive bounding boxes.
  • Figure 2: Box plots for mIoU and mDice results on each of the 4 datasets (reported on the x-axis) included in FMOX. Both FMOv2 and TbD are more challenging as objects tracked are of smaller sizes with often non overlapping ground truth bounding boxes between frames $n$ and $n+1$ (cf. Fig. \ref{['fig:datasets']}). The low minima (=0) highlight the challenge presented by some sequences in these datasets for the trackers tested.
  • Figure 3: Tracking performance (IoU) across frames on the sequence v_rubber_GTgamma from dataset Falling Object. All trackers fails to propose a bounding box for frame 38 while EfficientTAM also fails for frames 39 to 41 included. Corresponding frame numbers are given on top of each frame, and object (rubber) locations are indicated with red ground truth bounding boxes.
  • Figure 4: Tracking performance (IoU) across frames on the sequence v_key_GTgamma from Falling Object dataset. All trackers fail to propose a bounding box for frame 36. For frame 35, DAM4SAM is the sole successful tracker.
  • Figure 5: Examples of tracking performance (IoU) across Frames in sequences HighFPS_GT_depth2 in TbD-3D dataset (top: all trackers perform well and provided similar results) and throw_tennis from TbD dataset (bottom: all trackers performed poorly, with the exception of DAM4SAM; EfficientTAM failed to initialize for tracking due to the strong motion blur present on the object, resulting in no performance curve being generated for this sequence in the graph).