Table of Contents
Fetching ...

Computer Vision for Primate Behavior Analysis in the Wild

Richard Vogg, Timo Lüddecke, Jonathan Henrich, Sharmita Dey, Matthias Nuske, Valentin Hassler, Derek Murphy, Julia Fischer, Julia Ostner, Oliver Schülke, Peter M. Kappeler, Claudia Fichtel, Alexander Gail, Stefan Treue, Hansjörg Scherberger, Florentin Wörgötter, Alexander S. Ecker

TL;DR

The paper addresses automated analysis of primate behavior in the wild using computer vision, focusing on four core tasks and practical learning strategies. It surveys state-of-the-art methods for animal detection, multi-animal tracking, individual identification, and action understanding, highlighting their inputs/outputs and datasets. It argues for effort-efficient learning, data-centric strategies, and a unified video-based framework that jointly handles detection, tracking, identification, and action understanding. The authors discuss future directions including video-first design, intermediate representations, cross-modal supervision, and foundation models, aiming to bridge lab-friendly CV methods with wild-world behavioral science.

Abstract

Advances in computer vision as well as increasingly widespread video-based behavioral monitoring have great potential for transforming how we study animal cognition and behavior. However, there is still a fairly large gap between the exciting prospects and what can actually be achieved in practice today, especially in videos from the wild. With this perspective paper, we want to contribute towards closing this gap, by guiding behavioral scientists in what can be expected from current methods and steering computer vision researchers towards problems that are relevant to advance research in animal behavior. We start with a survey of the state-of-the-art methods for computer vision problems that are directly relevant to the video-based study of animal behavior, including object detection, multi-individual tracking, individual identification, and (inter)action recognition. We then review methods for effort-efficient learning, which is one of the biggest challenges from a practical perspective. Finally, we close with an outlook into the future of the emerging field of computer vision for animal behavior, where we argue that the field should develop approaches to unify detection, tracking, identification and (inter)action recognition in a single, video-based framework.

Computer Vision for Primate Behavior Analysis in the Wild

TL;DR

The paper addresses automated analysis of primate behavior in the wild using computer vision, focusing on four core tasks and practical learning strategies. It surveys state-of-the-art methods for animal detection, multi-animal tracking, individual identification, and action understanding, highlighting their inputs/outputs and datasets. It argues for effort-efficient learning, data-centric strategies, and a unified video-based framework that jointly handles detection, tracking, identification, and action understanding. The authors discuss future directions including video-first design, intermediate representations, cross-modal supervision, and foundation models, aiming to bridge lab-friendly CV methods with wild-world behavioral science.

Abstract

Advances in computer vision as well as increasingly widespread video-based behavioral monitoring have great potential for transforming how we study animal cognition and behavior. However, there is still a fairly large gap between the exciting prospects and what can actually be achieved in practice today, especially in videos from the wild. With this perspective paper, we want to contribute towards closing this gap, by guiding behavioral scientists in what can be expected from current methods and steering computer vision researchers towards problems that are relevant to advance research in animal behavior. We start with a survey of the state-of-the-art methods for computer vision problems that are directly relevant to the video-based study of animal behavior, including object detection, multi-individual tracking, individual identification, and (inter)action recognition. We then review methods for effort-efficient learning, which is one of the biggest challenges from a practical perspective. Finally, we close with an outlook into the future of the emerging field of computer vision for animal behavior, where we argue that the field should develop approaches to unify detection, tracking, identification and (inter)action recognition in a single, video-based framework.
Paper Structure (45 sections, 7 figures)

This paper contains 45 sections, 7 figures.

Figures (7)

  • Figure 1: Structure of behavior analysis tasks: Starting from a video taken in the wild, the goal is to obtain a structured representation of actions and interactions between the depicted animals while keeping track of their identities. The behavior analysis output on the right shows the behavior of two individuals (blue and violet) over time.
  • Figure 2: Categories of action and interaction recognition tasks based on input and output temporal format as well as spatial granularity.
  • Figure 3: Schematic overview of effort-efficient learning techniques.
  • Figure 4: Selection of important developments in computer vision and their impact on behavioral analysis.
  • Figure 5: Multi-modal large language models (here GPT4-V) can be used to understand challenging scenes. It succeeds in recognizing the general setting (monkeys sitting on a branch) but fails to detect details (the baby monkey, spatial relation, interaction). While current models are not capable of a fully automated analysis they might serve as a tool for pseudo labeling. More examples can be found in Appendix \ref{['sec:appGPT']}.
  • ...and 2 more figures