Bootstrapping Corner Cases: High-Resolution Inpainting for Safety Critical Detect and Avoid for Automated Flying
Jonathan Lyhs, Lars Hinneburg, Michael Fischer, Florian Ölsner, Stefan Milz, Jeremy Tschirner, Patrick Mäder
TL;DR
The paper tackles data scarcity in safety-critical Detect and Avoid for drones by introducing an inpainting-based pipeline to generate high-resolution, labeled images with airborne objects inserted into real backgrounds. It evaluates two object-synthesis approaches—Pix2Pix (cGAN-based) and Stable Diffusion (latent diffusion)—within a unified data-generation workflow, highlighting each method's strengths and limitations. Pix2Pix provides accurate ground-truth bounding boxes with faster, more predictable labeling, while diffusion offers higher visual realism but yields coarser ground-truth boxes that require adaptation. The study demonstrates feasibility on consumer hardware to produce large synthetic datasets and reveals a substantial domain gap when translating models trained on real data to synthetic data, motivating future fine-tuning and pre-training to improve detector robustness and generalization.
Abstract
Modern machine learning techniques have shown tremendous potential, especially for object detection on camera images. For this reason, they are also used to enable safety-critical automated processes such as autonomous drone flights. We present a study on object detection for Detect and Avoid, a safety critical function for drones that detects air traffic during automated flights for safety reasons. An ill-posed problem is the generation of good and especially large data sets, since detection itself is the corner case. Most models suffer from limited ground truth in raw data, \eg recorded air traffic or frontal flight with a small aircraft. It often leads to poor and critical detection rates. We overcome this problem by using inpainting methods to bootstrap the dataset such that it explicitly contains the corner cases of the raw data. We provide an overview of inpainting methods and generative models and present an example pipeline given a small annotated dataset. We validate our method by generating a high-resolution dataset, which we make publicly available and present it to an independent object detector that was fully trained on real data.
