Synthetic Data Generation for Minimum-Exposure Navigation in a Time-Varying Environment using Generative AI Models
Nachiket U. Bapat, Randy C. Paffenroth, Raghvendra V. Cowlagi
TL;DR
The paper tackles synthetic data generation for autonomous navigation in a time-varying threat field when real observations are scarce. It introduces the split variational recurrent neural network (S-VRNN), which fuses small real data with a dynamics-derived noiseless support by splitting the latent space into two subspaces, $\kappa_1$ and $\kappa_2$, to separate data-specific noise from dynamics-driven structure. Empirical results show that S-VRNN yields synthetic samples whose distribution closely matches the real data, outperforming both a purely data-driven VRNN and a split-VAE, particularly in low-data regimes, and reflecting the prescribed dynamics via a Hurwitz matrix $A$. This dynamics-aware approach reduces the reality gap for synthetic data, facilitating faster validation, planning, and digital twin development in engineering contexts with limited observations.
Abstract
We study the problem of synthetic generation of samples of environmental features for autonomous vehicle navigation. These features are described by a spatiotemporally varying scalar field that we refer to as a threat field. The threat field is known to have some underlying dynamics subject to process noise. Some "real-world" data of observations of various threat fields are also available. The assumption is that the volume of ``real-world'' data is relatively small. The objective is to synthesize samples that are statistically similar to the data. The proposed solution is a generative artificial intelligence model that we refer to as a split variational recurrent neural network (S-VRNN). The S-VRNN merges the capabilities of a variational autoencoder, which is a widely used generative model, and a recurrent neural network, which is used to learn temporal dependencies in data. The main innovation in this work is that we split the latent space of the S-VRNN into two subspaces. The latent variables in one subspace are learned using the ``real-world'' data, whereas those in the other subspace are learned using the data as well as the known underlying system dynamics. Through numerical experiments we demonstrate that the proposed S-VRNN can synthesize data that are statistically similar to the training data even in the case of very small volume of ``real-world'' training data.
