Table of Contents
Fetching ...

CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild

Alex Hoi Hang Chan, Neha Singhal, Onur Kocahan, Andrea Meltzer, Saverio Lubrano, Miyako H. Warrington, Michel Griesser, Fumihiro Kano, Hemal Naik

Abstract

Long-term behavioral monitoring of individual animals is crucial for studying behavioral changes that occur over different time scales, especially for conservation and evolutionary biology. Computer vision methods have proven to benefit biodiversity monitoring, but automated behavior monitoring in wild populations remains challenging. This stems from the lack of datasets that cover a range of computer vision tasks necessary to extract biologically meaningful measurements of individual animals. Here, we introduce such a dataset (CHIRP) with a new method (CORVID) for individual re-identification of wild birds. The CHIRP (Combining beHaviour, Individual Re-identification and Postures) dataset is curated from a long-term population of wild Siberian jays studied in Swedish Lapland, supporting re-identification (re-id), action recognition, 2D keypoint estimation, object detection, and instance segmentation. In addition to traditional task-specific benchmarking, we introduce application-specific benchmarking with biologically relevant metrics (feeding rates, co-occurrence rates) to evaluate the performance of models in real-world use cases. Finally, we present CORVID (COlouR-based Video re-ID), a novel pipeline for individual identification of birds based on the segmentation and classification of colored leg rings, a widespread approach for visual identification of individual birds. CORVID offers a probability-based id tracking method by matching the detected combination of color rings with a database. We use application-specific benchmarking to show that CORVID outperforms state-of-the-art re-id methods. We hope this work offers the community a blueprint for curating real-world datasets from ethically approved biological studies to bridge the gap between computer vision research and biological applications.

CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild

Abstract

Long-term behavioral monitoring of individual animals is crucial for studying behavioral changes that occur over different time scales, especially for conservation and evolutionary biology. Computer vision methods have proven to benefit biodiversity monitoring, but automated behavior monitoring in wild populations remains challenging. This stems from the lack of datasets that cover a range of computer vision tasks necessary to extract biologically meaningful measurements of individual animals. Here, we introduce such a dataset (CHIRP) with a new method (CORVID) for individual re-identification of wild birds. The CHIRP (Combining beHaviour, Individual Re-identification and Postures) dataset is curated from a long-term population of wild Siberian jays studied in Swedish Lapland, supporting re-identification (re-id), action recognition, 2D keypoint estimation, object detection, and instance segmentation. In addition to traditional task-specific benchmarking, we introduce application-specific benchmarking with biologically relevant metrics (feeding rates, co-occurrence rates) to evaluate the performance of models in real-world use cases. Finally, we present CORVID (COlouR-based Video re-ID), a novel pipeline for individual identification of birds based on the segmentation and classification of colored leg rings, a widespread approach for visual identification of individual birds. CORVID offers a probability-based id tracking method by matching the detected combination of color rings with a database. We use application-specific benchmarking to show that CORVID outperforms state-of-the-art re-id methods. We hope this work offers the community a blueprint for curating real-world datasets from ethically approved biological studies to bridge the gap between computer vision research and biological applications.

Paper Structure

This paper contains 19 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: CHIRP dataset summary. A) Solving the problem of who, with video re-identification datset of individuals. B) Solving the problem of what, with action recognition dataset and 2D keypoint estimation. C) Additional annotations to support the main tasks, including segmentation (yellow) of color rings, bounding box (green box) and segmentation (yellow) of birds. D) Application specific benchmark, 12 independent test videos with per frame annotation for bounding box, identities and behaviors, with novel metrics on errors related to biological measures like individual feeding rates and paired co-occurrence rates.
  • Figure 2: Summary of leg color ring definitions and naming convention of the Siberian jays. A) Sample image of an individual, and a list of the different color rings . Ring positions are defined as top/bottom left/ right from the bird’s perspective. B) Class distribution of ring masks provided in the ring segmentation dataset. C) Naming convention, each bird is named in the order of top left, bottom left, top right, bottom right ring. The color combination of bird in the picture is oaor: orange (o), aluminium (a), orange (o), red (r).
  • Figure 3: Class distribution for action recognition dataset
  • Figure 4: CORVID pipeline. Schematic for the color based re-ID approach pipeline. 1 second clips from CHIRP are fed into Mask2Former instance segmentation model, to extract masks of rings. The rings are grouped into ring pairs based on a distance threshold, then resized and converted into hsv space. The images are fed into a random forest model to predict probabilities of each color, then combined with associated ring pair to create a probability matrix of every color pair, then pooled across frames. Finally, the most probable bird is selected based on the possible birds that could be present in a given video.
  • Figure 5: Application specific benchmark results. Comparing ground truth measurements and predictions from pipelines to test for how different components affects biological measurements. We compared proposed CORVID pipeline, fine-tuned MegaDescriptor and random assignments for individual recognition. All pipelines used YOLOv8 for object detection, BoTSORT for tracking, and C3D for action recognition. Absolute errors of A) individual-level feeding rates and, B) co-occurrence rates and correlation of C) individual-level feeding rates and, D) co-occurrence rates. Individual feeding rates defined as number of times individual pecks at the food (pecks/min), and co-occurence rates is defined by the proportion of time two individuals were detected together, scaled by video length.