Table of Contents
Fetching ...

ShadowWolf -- Automatic Labelling, Evaluation and Model Training Optimised for Camera Trap Wildlife Images

Jens Dede, Anna Förster

TL;DR

ShadowWolf tackles the challenge of maintaining robust wildlife detection from camera-trap imagery under diverse environmental conditions by introducing a unified, autonomous labeling and model-training framework. It combines end-to-end processing with crowd-assisted validation (Wolf-or-Not) to iteratively generate labeled data and retrain detectors, demonstrated on wolf camera-trap data from two German parks. The approach yields a measurable improvement in F1 scores over a standalone detector, while examining daytime/nighttime performance and practical compute requirements for field deployment. Overall, ShadowWolf offers a scalable path toward continuous learning and automated model refinement suitable for embedded and GPU-enabled wildlife-monitoring systems.

Abstract

The continuous growth of the global human population is leading to the expansion of human habitats, resulting in decreasing wildlife spaces and increasing human-wildlife interactions. These interactions can range from minor disturbances, such as raccoons in urban waste bins, to more severe consequences, including species extinction. As a result, the monitoring of wildlife is gaining significance in various contexts. Artificial intelligence (AI) offers a solution by automating the recognition of animals in images and videos, thereby reducing the manual effort required for wildlife monitoring. Traditional AI training involves three main stages: image collection, labelling, and model training. However, the variability, for example, in the landscape (e.g., mountains, open fields, forests), weather (e.g., rain, fog, sunshine), lighting (e.g., day, night), and camera-animal distances presents significant challenges to model robustness and adaptability in real-world scenarios. In this work, we propose a unified framework, called ShadowWolf, designed to address these challenges by integrating and optimizing the stages of AI model training and evaluation. The proposed framework enables dynamic model retraining to adjust to changes in environmental conditions and application requirements, thereby reducing labelling efforts and allowing for on-site model adaptation. This adaptive and unified approach enhances the accuracy and efficiency of wildlife monitoring systems, promoting more effective and scalable conservation efforts.

ShadowWolf -- Automatic Labelling, Evaluation and Model Training Optimised for Camera Trap Wildlife Images

TL;DR

ShadowWolf tackles the challenge of maintaining robust wildlife detection from camera-trap imagery under diverse environmental conditions by introducing a unified, autonomous labeling and model-training framework. It combines end-to-end processing with crowd-assisted validation (Wolf-or-Not) to iteratively generate labeled data and retrain detectors, demonstrated on wolf camera-trap data from two German parks. The approach yields a measurable improvement in F1 scores over a standalone detector, while examining daytime/nighttime performance and practical compute requirements for field deployment. Overall, ShadowWolf offers a scalable path toward continuous learning and automated model refinement suitable for embedded and GPU-enabled wildlife-monitoring systems.

Abstract

The continuous growth of the global human population is leading to the expansion of human habitats, resulting in decreasing wildlife spaces and increasing human-wildlife interactions. These interactions can range from minor disturbances, such as raccoons in urban waste bins, to more severe consequences, including species extinction. As a result, the monitoring of wildlife is gaining significance in various contexts. Artificial intelligence (AI) offers a solution by automating the recognition of animals in images and videos, thereby reducing the manual effort required for wildlife monitoring. Traditional AI training involves three main stages: image collection, labelling, and model training. However, the variability, for example, in the landscape (e.g., mountains, open fields, forests), weather (e.g., rain, fog, sunshine), lighting (e.g., day, night), and camera-animal distances presents significant challenges to model robustness and adaptability in real-world scenarios. In this work, we propose a unified framework, called ShadowWolf, designed to address these challenges by integrating and optimizing the stages of AI model training and evaluation. The proposed framework enables dynamic model retraining to adjust to changes in environmental conditions and application requirements, thereby reducing labelling efforts and allowing for on-site model adaptation. This adaptive and unified approach enhances the accuracy and efficiency of wildlife monitoring systems, promoting more effective and scalable conservation efforts.

Paper Structure

This paper contains 38 sections, 7 equations, 12 figures, 12 tables.

Figures (12)

  • Figure 1: Our iterative approach utilizes the ShadowWolf model to label images. The generated labels are then used to train a new model generation. Subsequently, we evaluate the new model's performance against a reference dataset. If the performance shows improvement, the model is considered better. If not, we continue collecting additional data and repeat the process. In case of an improved model, we use this in ShadowWolf and can also deploy it to other systems like detection in the field as marked by the green box.
  • Figure 2: A wolf captured in the night. This picture was taken in the Wingster Waldzoo.
  • Figure 3: The distribution of our image dataset over a day varies between the parks. During the night, images are only recorded when motion is detected within the range of the built-in infrared spot.
  • Figure 4: Evaluation process of ShadowWolf: The automatically generated labels are compared against user-provided ground truth. Ideally, the labeled images would match the ground truth with 100% accuracy, indicating that ShadowWolf performs as effectively as manual labeling.
  • Figure 5: The ShadowWolf framework and its evaluation process aim to transform input videos, images, or image series into well-labeled datasets. The components within the dotted box represent ShadowWolf's three main stages: Preprocessing, Detection, and Postprocessing, each with configurable submodules. The Detection phase utilizes a trained animal detection model that is iteratively improved, as depicted in Figure \ref{['fig:cont_learning']}. Performance evaluation is conducted using a reference dataset and predefined metrics, illustrated by the boxes outside the dotted area. This involves comparing ShadowWolf-generated labels with manually created ground truth labels. Ideally, these labels should align perfectly. Combined with the iterative model training, as shown in Figure \ref{['fig:cont_learning']} this results in automatically generated, enhanced models. The framework operates fully automatically and requires minimal administrative intervention.
  • ...and 7 more figures