Table of Contents
Fetching ...

Endangered Alert: A Field-Validated Self-Training Scheme for Detecting and Protecting Threatened Wildlife on Roads and Roadsides

Kunming Li, Mao Shan, Stephany Berrio Perez, Katie Luo, Stewart Worrall

TL;DR

This work tackles the challenge of detecting rare wildlife on roads under data-scarce, resource-constrained conditions by introducing a cloud–edge self-training framework augmented with Label-Augmentation Non-Maximum Suppression (LA-NMS). Stage 1 generates synthetic data on the cloud from web images, using LA-NMS and vision-language models (OWL-ViT and SAM) to produce pseudo-labels for an initial edge detector. Stage 2 deploys the edge model, collects field data, and iteratively refines the detector by auto-labelling field data on the cloud and fine-tuning the edge model, enabling continuous adaptation to new environments and a thermal-domain extension. A five-month field deployment demonstrates improved detection accuracy, higher prediction confidence, and practical viability for real-time driver alerts, highlighting the approach’s potential to mitigate animal–vehicle collisions in remote, bandwidth-limited settings.

Abstract

Traffic accidents are a global safety concern, resulting in numerous fatalities each year. A considerable number of these deaths are caused by animal-vehicle collisions (AVCs), which not only endanger human lives but also present serious risks to animal populations. This paper presents an innovative self-training methodology aimed at detecting rare animals, such as the cassowary in Australia, whose survival is threatened by road accidents. The proposed method addresses critical real-world challenges, including acquiring and labelling sensor data for rare animal species in resource-limited environments. It achieves this by leveraging cloud and edge computing, and automatic data labelling to improve the detection performance of the field-deployed model iteratively. Our approach introduces Label-Augmentation Non-Maximum Suppression (LA-NMS), which incorporates a vision-language model (VLM) to enable automated data labelling. During a five-month deployment, we confirmed the method's robustness and effectiveness, resulting in improved object detection accuracy and increased prediction confidence. The source code is available: https://github.com/acfr/CassDetect

Endangered Alert: A Field-Validated Self-Training Scheme for Detecting and Protecting Threatened Wildlife on Roads and Roadsides

TL;DR

This work tackles the challenge of detecting rare wildlife on roads under data-scarce, resource-constrained conditions by introducing a cloud–edge self-training framework augmented with Label-Augmentation Non-Maximum Suppression (LA-NMS). Stage 1 generates synthetic data on the cloud from web images, using LA-NMS and vision-language models (OWL-ViT and SAM) to produce pseudo-labels for an initial edge detector. Stage 2 deploys the edge model, collects field data, and iteratively refines the detector by auto-labelling field data on the cloud and fine-tuning the edge model, enabling continuous adaptation to new environments and a thermal-domain extension. A five-month field deployment demonstrates improved detection accuracy, higher prediction confidence, and practical viability for real-time driver alerts, highlighting the approach’s potential to mitigate animal–vehicle collisions in remote, bandwidth-limited settings.

Abstract

Traffic accidents are a global safety concern, resulting in numerous fatalities each year. A considerable number of these deaths are caused by animal-vehicle collisions (AVCs), which not only endanger human lives but also present serious risks to animal populations. This paper presents an innovative self-training methodology aimed at detecting rare animals, such as the cassowary in Australia, whose survival is threatened by road accidents. The proposed method addresses critical real-world challenges, including acquiring and labelling sensor data for rare animal species in resource-limited environments. It achieves this by leveraging cloud and edge computing, and automatic data labelling to improve the detection performance of the field-deployed model iteratively. Our approach introduces Label-Augmentation Non-Maximum Suppression (LA-NMS), which incorporates a vision-language model (VLM) to enable automated data labelling. During a five-month deployment, we confirmed the method's robustness and effectiveness, resulting in improved object detection accuracy and increased prediction confidence. The source code is available: https://github.com/acfr/CassDetect

Paper Structure

This paper contains 28 sections, 4 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Deployed system identifying a cassowary crossing a busy road. Upon detecting a cassowary, as illustrated in (a), the system alerts approaching vehicles through a variable message sign (VMS) to help prevent potential collisions, as shown in (b). (Message design credit: Ioni Lewis, Queensland University of Technology)
  • Figure 2: Overview of the proposed self-training ML scheme for roadside animal detection. Initially, the cloud-based model $W_{\text{cloud}}$ synthesises images of cassowaries and generates the pseudo-labels using the web-sourced cassowary images and field background images. These images are used to train the initial field detection model for deployment on the edge device. In the deployment environment, this field model $W_{\text{field}}$ processes images captured in real time and selects relevant data to send back to the cloud server. $W_{\text{cloud}}$ then automatically processes the received field data, generating pseudo-labels, which are used to fine-tune $W_{\text{field}}$. This iterative cycle progressively improves the detection performance of $W_{\text{field}}$. Note that the pseudo-labelling task is performed using the proposed LA-NMS approach, which integrates a VLM (i.e., OWL-VIT minderer2024scaling in our work), within $W_{\text{cloud}}$.
  • Figure 3: Workflow of the proposed LA-NMS. The approach begins by taking an input animal label and generating multiple related class labels. These augmented labels and input images are encoded using a text transformer encoder and a vision transformer encoder, respectively, within a VLM, i.e., OWL-VIT in this work. This process facilitates the identification of detection candidates in the input images based on the augmented labels. With detection results, the LA-NMS approach then applies NMS to eliminate redundant bounding boxes, selecting the most likely candidates as the output pseudo labels.
  • Figure 4: Comparison of mAP for various object classes from the VOC dataset, analysed with and without LA-NMS. 'Label Aug.' in the legend refers to LA-NMS.
  • Figure 5: Performance of the VLM with and without LA-NMS enabled across various threshold settings. 'Label Aug.' in the legend refers to LA-NMS. The results were obtained based on the field evaluation dataset.
  • ...and 3 more figures