Zero-Shot Multi-Animal Tracking in the Wild

Jan Frederik Meier; Timo Lüddecke

Zero-Shot Multi-Animal Tracking in the Wild

Jan Frederik Meier, Timo Lüddecke

TL;DR

This work tackles zero-shot multi-animal tracking in the wild by adapting SAM2MOT to operate without retraining or hyperparameter tuning. It introduces three robust components—adaptive detection thresholds, mask-based track initialization, and density-aware reconstruction—built atop Grounding DINO and SAM 2 to generalize across diverse datasets. Across four benchmarks, the method achieves strong HOTA and association metrics, demonstrating reliable cross-domain performance and enabling scalable wildlife monitoring and behavioral analysis. The approach emphasizes practical applicability, highlighting accuracy and robustness while acknowledging runtime and scalability considerations in crowded scenes.

Abstract

Multi-animal tracking is crucial for understanding animal ecology and behavior. However, it remains a challenging task due to variations in habitat, motion patterns, and species appearance. Traditional approaches typically require extensive model fine-tuning and heuristic design for each application scenario. In this work, we explore the potential of recent vision foundation models for zero-shot multi-animal tracking. By combining a Grounding Dino object detector with the Segment Anything Model 2 (SAM 2) tracker and carefully designed heuristics, we develop a tracking framework that can be applied to new datasets without any retraining or hyperparameter adaptation. Evaluations on ChimpAct, Bird Flock Tracking, AnimalTrack, and a subset of GMOT-40 demonstrate strong and consistent performance across diverse species and environments. The code is available at https://github.com/ecker-lab/SAM2-Animal-Tracking.

Zero-Shot Multi-Animal Tracking in the Wild

TL;DR

Abstract

Zero-Shot Multi-Animal Tracking in the Wild

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)