Training Datasets Generation for Machine Learning: Application to Vision Based Navigation
Jérémy Lebreton, Ingo Ahrns, Roland Brochard, Christoph Haskamp, Hans Krüger, Matthieu Le Goff, Nicolas Menga, Nicolas Ollagnier, Ralf Regele, Francesco Capolupo, Massimo Casasco
TL;DR
The paper addresses the problem of validating machine learning for vision-based navigation in space by building a comprehensive dataset-generation pipeline that combines real lunar data, lab-based experiments, high-fidelity simulations, and generative AI. It introduces a two-use-case framework (ENVISAT rendezvous and a Moon landing scenario) and a robust ground-truth methodology to benchmark pose estimation and dense optical flow. Key contributions include a SurRender-driven data generation workflow with model capture for realistic BRDFs, a lab-based ENVISAT dataset, a TRON facility replication of lunar dynamics, and a GAN-based domain transfer approach (CUT) to bridge synthetic and real imagery. The findings demonstrate that synthetic and lab-generated data can train AI models that generalize to real space imagery, particularly for optical flow, and provide a mature toolset for ESA data sharing and future extensions such as richer metadata and diverse trajectories.
Abstract
Vision Based Navigation consists in utilizing cameras as precision sensors for GNC after extracting information from images. To enable the adoption of machine learning for space applications, one of obstacles is the demonstration that available training datasets are adequate to validate the algorithms. The objective of the study is to generate datasets of images and metadata suitable for training machine learning algorithms. Two use cases were selected and a robust methodology was developed to validate the datasets including the ground truth. The first use case is in-orbit rendezvous with a man-made object: a mockup of satellite ENVISAT. The second use case is a Lunar landing scenario. Datasets were produced from archival datasets (Chang'e 3), from the laboratory at DLR TRON facility and at Airbus Robotic laboratory, from SurRender software high fidelity image simulator using Model Capture and from Generative Adversarial Networks. The use case definition included the selection of algorithms as benchmark: an AI-based pose estimation algorithm and a dense optical flow algorithm were selected. Eventually it is demonstrated that datasets produced with SurRender and selected laboratory facilities are adequate to train machine learning algorithms.
