Consistent multi-animal pose estimation in cattle using dynamic Kalman filter based tracking
Maarten Perneel, Ines Adriaens, Ben Aernouts, Jan Verwaeren
TL;DR
This work addresses the challenge of studying social and individual cattle behaviours by integrating pose estimation with online tracking to produce temporally coherent multi-animal skeleton data. It introduces KeySORT, an adaptive Kalman-filter-based tracker that operates directly on skeletal keypoints, enabling bounding-box–free tracking and improved temporal stability of keypoint coordinates. The authors show that up to 80% of ground-truth keypoints can be detected with high accuracy and that KeySORT substantially reduces frame-to-frame keypoint jitter, maintaining robustness under daylight and night-vision conditions. The approach is designed to generalize beyond cattle to other species and supports automated behavioural monitoring and data reuse for broader research questions.
Abstract
Over the past decade, studying animal behaviour with the help of computer vision has become more popular. Replacing human observers by computer vision lowers the cost of data collection and therefore allows to collect more extensive datasets. However, the majority of available computer vision algorithms to study animal behaviour is highly tailored towards a single research objective, limiting possibilities for data reuse. In this perspective, pose-estimation in combination with animal tracking offers opportunities to yield a higher level representation capturing both the spatial and temporal component of animal behaviour. Such a higher level representation allows to answer a wide variety of research questions simultaneously, without the need to develop repeatedly tailored computer vision algorithms. In this paper, we therefore first cope with several weaknesses of current pose-estimation algorithms and thereafter introduce KeySORT (Keypoint Simple and Online Realtime Tracking). KeySORT deploys an adaptive Kalman filter to construct tracklets in a bounding-box free manner, significantly improving the temporal consistency of detected keypoints. In this paper, we focus on pose estimation in cattle, but our methodology can easily be generalised to any other animal species. Our test results indicate our algorithm is able to detect up to 80% of the ground truth keypoints with high accuracy, with only a limited drop in performance when daylight recordings are compared to nightvision recordings. Moreover, by using KeySORT to construct skeletons, the temporal consistency of generated keypoint coordinates was largely improved, offering opportunities with regard to automated behaviour monitoring of animals.
