Table of Contents
Fetching ...

Bootstrapping Corner Cases: High-Resolution Inpainting for Safety Critical Detect and Avoid for Automated Flying

Jonathan Lyhs, Lars Hinneburg, Michael Fischer, Florian Ölsner, Stefan Milz, Jeremy Tschirner, Patrick Mäder

TL;DR

The paper tackles data scarcity in safety-critical Detect and Avoid for drones by introducing an inpainting-based pipeline to generate high-resolution, labeled images with airborne objects inserted into real backgrounds. It evaluates two object-synthesis approaches—Pix2Pix (cGAN-based) and Stable Diffusion (latent diffusion)—within a unified data-generation workflow, highlighting each method's strengths and limitations. Pix2Pix provides accurate ground-truth bounding boxes with faster, more predictable labeling, while diffusion offers higher visual realism but yields coarser ground-truth boxes that require adaptation. The study demonstrates feasibility on consumer hardware to produce large synthetic datasets and reveals a substantial domain gap when translating models trained on real data to synthetic data, motivating future fine-tuning and pre-training to improve detector robustness and generalization.

Abstract

Modern machine learning techniques have shown tremendous potential, especially for object detection on camera images. For this reason, they are also used to enable safety-critical automated processes such as autonomous drone flights. We present a study on object detection for Detect and Avoid, a safety critical function for drones that detects air traffic during automated flights for safety reasons. An ill-posed problem is the generation of good and especially large data sets, since detection itself is the corner case. Most models suffer from limited ground truth in raw data, \eg recorded air traffic or frontal flight with a small aircraft. It often leads to poor and critical detection rates. We overcome this problem by using inpainting methods to bootstrap the dataset such that it explicitly contains the corner cases of the raw data. We provide an overview of inpainting methods and generative models and present an example pipeline given a small annotated dataset. We validate our method by generating a high-resolution dataset, which we make publicly available and present it to an independent object detector that was fully trained on real data.

Bootstrapping Corner Cases: High-Resolution Inpainting for Safety Critical Detect and Avoid for Automated Flying

TL;DR

The paper tackles data scarcity in safety-critical Detect and Avoid for drones by introducing an inpainting-based pipeline to generate high-resolution, labeled images with airborne objects inserted into real backgrounds. It evaluates two object-synthesis approaches—Pix2Pix (cGAN-based) and Stable Diffusion (latent diffusion)—within a unified data-generation workflow, highlighting each method's strengths and limitations. Pix2Pix provides accurate ground-truth bounding boxes with faster, more predictable labeling, while diffusion offers higher visual realism but yields coarser ground-truth boxes that require adaptation. The study demonstrates feasibility on consumer hardware to produce large synthetic datasets and reveals a substantial domain gap when translating models trained on real data to synthetic data, motivating future fine-tuning and pre-training to improve detector robustness and generalization.

Abstract

Modern machine learning techniques have shown tremendous potential, especially for object detection on camera images. For this reason, they are also used to enable safety-critical automated processes such as autonomous drone flights. We present a study on object detection for Detect and Avoid, a safety critical function for drones that detects air traffic during automated flights for safety reasons. An ill-posed problem is the generation of good and especially large data sets, since detection itself is the corner case. Most models suffer from limited ground truth in raw data, \eg recorded air traffic or frontal flight with a small aircraft. It often leads to poor and critical detection rates. We overcome this problem by using inpainting methods to bootstrap the dataset such that it explicitly contains the corner cases of the raw data. We provide an overview of inpainting methods and generative models and present an example pipeline given a small annotated dataset. We validate our method by generating a high-resolution dataset, which we make publicly available and present it to an independent object detector that was fully trained on real data.
Paper Structure (10 sections, 6 figures, 5 tables)

This paper contains 10 sections, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Airborne Object Detection. The high-resolution image shows a small airplane. This is a common situation in Airborne Object Detection for Sense and Avoid functionality. The airplane was generated using our proposed data generation pipeline including ground truth bounding box.
  • Figure 2: Data Generation Pipeline. Cropping the masked inpainting area from the background is the first step of our proposed pipeline. The cropped image is then passed into the model specific conditioning of the following image synthesis process. The last step is merging the generated image back into the background.
  • Figure 3: Proposed Pix2Pix Training. The generator and the discriminator are trained together. Thereby the generator learns to generate an image close to the ground truth $y$. Part of the generator input is replaced by a mask. The discriminator learns to decide whether an input is fake or real. pix2pix2017
  • Figure 4: Data Generation Results. The images show exemplary selected outputs of the proposed data generation pipeline. The left column is generated by using Stable Diffusion (SD) for inpainting. Pix2Pix was used for generating the right column.
  • Figure 5: Data Generation Results. The images show close-ups of selected samples of the generated data set. The classes of the objects are mentioned in the captions. The upper row shows qualitatively worse results compared to the lower row.
  • ...and 1 more figures