Watching Swarm Dynamics from Above: A Framework for Advanced Object Tracking in Drone Videos
Duc Pham, Matthew Hansen, Félicie Dhellemmes, Jens Krause, Pia Bideau
TL;DR
The paper tackles long-term tracking of collective animal behavior from moving drone footage in open environments where landmarks are scarce. It introduces SwDA, which fuses frame-level semantic segmentation with a particle-filter Bayesian tracker to recursively integrate observations $o_t$ and drone motion, yielding 2D swarm footprints and 3D world trajectories via $p = K [R|t] P$. Key contributions include a novel framework for world-coordinate swarm tracking in marine settings, a 40-minute drone video dataset with synchronized sensors and pixel-accurate masks, and comprehensive evaluations showing robustness in low-data regimes and accurate 3D localization. The work enables non-invasive, scalable study of open-ocean collective behavior and demonstrates how learning-based perception can be effectively integrated with classical state estimation for ecological research.
Abstract
Easily accessible sensors, like drones with diverse onboard sensors, have greatly expanded studying animal behavior in natural environments. Yet, analyzing vast, unlabeled video data, often spanning hours, remains a challenge for machine learning, especially in computer vision. Existing approaches often analyze only a few frames. Our focus is on long-term animal behavior analysis. To address this challenge, we utilize classical probabilistic methods for state estimation, such as particle filtering. By incorporating recent advancements in semantic object segmentation, we enable continuous tracking of rapidly evolving object formations, even in scenarios with limited data availability. Particle filters offer a provably optimal algorithmic structure for recursively adding new incoming information. We propose a novel approach for tracking schools of fish in the open ocean from drone videos. Our framework not only performs classical object tracking in 2D, instead it tracks the position and spatial expansion of the fish school in world coordinates by fusing video data and the drone's on board sensor information (GPS and IMU). The presented framework for the first time allows researchers to study collective behavior of fish schools in its natural social and environmental context in a non-invasive and scalable way.
