Detecting Wildfires on UAVs with Real-time Segmentation Trained by Larger Teacher Models
Julius Pesonen, Teemu Hakala, Väinö Karjalainen, Niko Koivumäki, Lauri Markelin, Anna-Maria Raita-Hakola, Juha Suomalainen, Ilkka Pölönen, Eija Honkavaara
TL;DR
This paper addresses the challenge of real-time wildfire smoke segmentation on UAVs with limited onboard compute. It introduces a knowledge-distillation framework where large foundation-models generate pixel-level pseudo-labels from bounding-box annotations, guiding a compact PIDNet to perform real-time segmentation onboard. The approach achieves about 63.3% mIoU on a diverse manually labeled test set and runs at ~25 fps on an NVIDIA Jetson Orin NX, with successful smoke recognition up to 9.7 km in real-world forest burns. The results demonstrate the practicality of turning bounding-box–based detections into segmentation for onboard early-fire detection, while highlighting areas for improvement such as data diversity and pseudo-label quality.
Abstract
Early detection of wildfires is essential to prevent large-scale fires resulting in extensive environmental, structural, and societal damage. Uncrewed aerial vehicles (UAVs) can cover large remote areas effectively with quick deployment requiring minimal infrastructure and equipping them with small cameras and computers enables autonomous real-time detection. In remote areas, however, detection methods are limited to onboard computation due to the lack of high-bandwidth mobile networks. For accurate camera-based localisation, segmentation of the detected smoke is essential but training data for deep learning-based wildfire smoke segmentation is limited. This study shows how small specialised segmentation models can be trained using only bounding box labels, leveraging zero-shot foundation model supervision. The method offers the advantages of needing only fairly easily obtainable bounding box labels and requiring training solely for the smaller student network. The proposed method achieved 63.3% mIoU on a manually annotated and diverse wildfire dataset. The used model can perform in real-time at ~25 fps with a UAV-carried NVIDIA Jetson Orin NX computer while reliably recognising smoke, as demonstrated at real-world forest burning events. Code is available at: https://gitlab.com/fgi_nls/public/wildfire-real-time-segmentation
