Table of Contents
Fetching ...

Detecting Wildfires on UAVs with Real-time Segmentation Trained by Larger Teacher Models

Julius Pesonen, Teemu Hakala, Väinö Karjalainen, Niko Koivumäki, Lauri Markelin, Anna-Maria Raita-Hakola, Juha Suomalainen, Ilkka Pölönen, Eija Honkavaara

TL;DR

This paper addresses the challenge of real-time wildfire smoke segmentation on UAVs with limited onboard compute. It introduces a knowledge-distillation framework where large foundation-models generate pixel-level pseudo-labels from bounding-box annotations, guiding a compact PIDNet to perform real-time segmentation onboard. The approach achieves about 63.3% mIoU on a diverse manually labeled test set and runs at ~25 fps on an NVIDIA Jetson Orin NX, with successful smoke recognition up to 9.7 km in real-world forest burns. The results demonstrate the practicality of turning bounding-box–based detections into segmentation for onboard early-fire detection, while highlighting areas for improvement such as data diversity and pseudo-label quality.

Abstract

Early detection of wildfires is essential to prevent large-scale fires resulting in extensive environmental, structural, and societal damage. Uncrewed aerial vehicles (UAVs) can cover large remote areas effectively with quick deployment requiring minimal infrastructure and equipping them with small cameras and computers enables autonomous real-time detection. In remote areas, however, detection methods are limited to onboard computation due to the lack of high-bandwidth mobile networks. For accurate camera-based localisation, segmentation of the detected smoke is essential but training data for deep learning-based wildfire smoke segmentation is limited. This study shows how small specialised segmentation models can be trained using only bounding box labels, leveraging zero-shot foundation model supervision. The method offers the advantages of needing only fairly easily obtainable bounding box labels and requiring training solely for the smaller student network. The proposed method achieved 63.3% mIoU on a manually annotated and diverse wildfire dataset. The used model can perform in real-time at ~25 fps with a UAV-carried NVIDIA Jetson Orin NX computer while reliably recognising smoke, as demonstrated at real-world forest burning events. Code is available at: https://gitlab.com/fgi_nls/public/wildfire-real-time-segmentation

Detecting Wildfires on UAVs with Real-time Segmentation Trained by Larger Teacher Models

TL;DR

This paper addresses the challenge of real-time wildfire smoke segmentation on UAVs with limited onboard compute. It introduces a knowledge-distillation framework where large foundation-models generate pixel-level pseudo-labels from bounding-box annotations, guiding a compact PIDNet to perform real-time segmentation onboard. The approach achieves about 63.3% mIoU on a diverse manually labeled test set and runs at ~25 fps on an NVIDIA Jetson Orin NX, with successful smoke recognition up to 9.7 km in real-world forest burns. The results demonstrate the practicality of turning bounding-box–based detections into segmentation for onboard early-fire detection, while highlighting areas for improvement such as data diversity and pseudo-label quality.

Abstract

Early detection of wildfires is essential to prevent large-scale fires resulting in extensive environmental, structural, and societal damage. Uncrewed aerial vehicles (UAVs) can cover large remote areas effectively with quick deployment requiring minimal infrastructure and equipping them with small cameras and computers enables autonomous real-time detection. In remote areas, however, detection methods are limited to onboard computation due to the lack of high-bandwidth mobile networks. For accurate camera-based localisation, segmentation of the detected smoke is essential but training data for deep learning-based wildfire smoke segmentation is limited. This study shows how small specialised segmentation models can be trained using only bounding box labels, leveraging zero-shot foundation model supervision. The method offers the advantages of needing only fairly easily obtainable bounding box labels and requiring training solely for the smaller student network. The proposed method achieved 63.3% mIoU on a manually annotated and diverse wildfire dataset. The used model can perform in real-time at ~25 fps with a UAV-carried NVIDIA Jetson Orin NX computer while reliably recognising smoke, as demonstrated at real-world forest burning events. Code is available at: https://gitlab.com/fgi_nls/public/wildfire-real-time-segmentation
Paper Structure (12 sections, 6 equations, 4 figures, 17 tables)

This paper contains 12 sections, 6 equations, 4 figures, 17 tables.

Figures (4)

  • Figure 1: Inference model training scheme. The teacher model is either trained using bounding box supervision method such as BoxSnake Yang2023BoxSnakePI or guided with the bounding boxes like SAM kirillov2023segment.
  • Figure 2: Examples from the used bounding box labelled wildfire smoke datasets. The ones with a 4:3 aspect ratio are from the AI For Mankind datasets and the rest are from the UAV dataset.
  • Figure 3: The distillation scheme used to train the final inference model. The label edges are generated with Canny edge detection Canny1986ACA on the label mask image.
  • Figure 4: Visualisations of the practical real-time detection tests. The shown results, from left to right, were achieved from 1.4, 4.0 and 9.7 kilometres away from the burning, in ground distance.