Table of Contents
Fetching ...

Re-Identifying Kākā with AI-Automated Video Key Frame Extraction

Paula Maddigan, Andrew Lensen, Rachael C. Shaw

TL;DR

This work tackles the challenge of non-invasively re-identifying individual kākā by automatically extracting high-quality key frames from feeder videos. It introduces a modular AI pipeline that combines fine-tuned Kākā-YOLO detection, blur filtering via Gunnar Farnebäck scores, DINOv2-based frame embeddings, and clustering-based key-frame selection, followed by cosine-similarity matching against a labeled database. The study demonstrates that careful frame selection and robust embeddings yield high re-ID accuracy (up to ~98.6% in some datasets) with several viable configurations, and provides a baseline unsupervised framework suitable for expansion to trail-camera data and other species. The results highlight the trade-offs between frame quantity, selection strategy, and embedding model choice, and underscore the potential of automated key-frame extraction to improve wildlife monitoring while reducing invasive tagging practices.

Abstract

Accurate recognition and re-identification of individual animals is essential for successful wildlife population monitoring. Traditional methods, such as leg banding of birds, are time consuming and invasive. Recent progress in artificial intelligence, particularly computer vision, offers encouraging solutions for smart conservation and efficient automation. This study presents a unique pipeline for extracting high-quality key frames from videos of kākā (Nestor meridionalis), a threatened forest-dwelling parrot in New Zealand. Key frame extraction is well-studied in person re-identification, however, its application to wildlife is limited. Using video recordings at a custom-built feeder, we extract key frames and evaluate the re-identification performance of our pipeline. Our unsupervised methodology combines object detection using YOLO and Grounding DINO, optical flow blur detection, image encoding with DINOv2, and clustering methods to identify representative key frames. The results indicate that our proposed key frame selection methods yield image collections which achieve high accuracy in kākā re-identification, providing a foundation for future research using media collected in more diverse and challenging environments. Through the use of artificial intelligence and computer vision, our non-invasive and efficient approach provides a valuable alternative to traditional physical tagging methods for recognising kākā individuals and therefore improving the monitoring of populations. This research contributes to developing fresh approaches in wildlife monitoring, with applications in ecology and conservation biology.

Re-Identifying Kākā with AI-Automated Video Key Frame Extraction

TL;DR

This work tackles the challenge of non-invasively re-identifying individual kākā by automatically extracting high-quality key frames from feeder videos. It introduces a modular AI pipeline that combines fine-tuned Kākā-YOLO detection, blur filtering via Gunnar Farnebäck scores, DINOv2-based frame embeddings, and clustering-based key-frame selection, followed by cosine-similarity matching against a labeled database. The study demonstrates that careful frame selection and robust embeddings yield high re-ID accuracy (up to ~98.6% in some datasets) with several viable configurations, and provides a baseline unsupervised framework suitable for expansion to trail-camera data and other species. The results highlight the trade-offs between frame quantity, selection strategy, and embedding model choice, and underscore the potential of automated key-frame extraction to improve wildlife monitoring while reducing invasive tagging practices.

Abstract

Accurate recognition and re-identification of individual animals is essential for successful wildlife population monitoring. Traditional methods, such as leg banding of birds, are time consuming and invasive. Recent progress in artificial intelligence, particularly computer vision, offers encouraging solutions for smart conservation and efficient automation. This study presents a unique pipeline for extracting high-quality key frames from videos of kākā (Nestor meridionalis), a threatened forest-dwelling parrot in New Zealand. Key frame extraction is well-studied in person re-identification, however, its application to wildlife is limited. Using video recordings at a custom-built feeder, we extract key frames and evaluate the re-identification performance of our pipeline. Our unsupervised methodology combines object detection using YOLO and Grounding DINO, optical flow blur detection, image encoding with DINOv2, and clustering methods to identify representative key frames. The results indicate that our proposed key frame selection methods yield image collections which achieve high accuracy in kākā re-identification, providing a foundation for future research using media collected in more diverse and challenging environments. Through the use of artificial intelligence and computer vision, our non-invasive and efficient approach provides a valuable alternative to traditional physical tagging methods for recognising kākā individuals and therefore improving the monitoring of populations. This research contributes to developing fresh approaches in wildlife monitoring, with applications in ecology and conservation biology.

Paper Structure

This paper contains 33 sections, 2 equations, 16 figures, 10 tables.

Figures (16)

  • Figure 1: The custom-built kākā feeder during an active period of feeding. The design features a ledge for the bird to perch on and a nozzle that dispenses food. The setup enables recording of the kākā's head in a profile view to optimise the capture of the unique beak morphology of individuals. The figure inset shows the white, non-reflective plastic cover installed inside the feeder to control the background environment.
  • Figure 2: Methodology for identifying an individual kākā from visual media.
  • Figure 3: Overview of the development of Kākā-YOLO, a fine-tuned YOLO model for kākā object detection.
  • Figure 4: (a) Comparison of the base YOLO11m model and (b) fine-tuned Kākā-YOLO model inference.
  • Figure 5: Frame extraction stage of pipeline.
  • ...and 11 more figures