Self-supervised diffusion model fine-tuning for costate initialization using Markov chain Monte Carlo
Jannik Graebner, Ryne Beeson
TL;DR
The paper tackles the challenge of initializing costates for long-duration, low-thrust spacecraft transfers in multibody settings by learning a Pareto-aware distribution of costate initializations. It advances a self-supervised framework that couples diffusion-based sampling with Markov Chain Monte Carlo (MCMC) refinement and reward-weighted fine-tuning to complete Pareto fronts without extensive solver-generated data. The approach is demonstrated on Jupiter-Europa and Saturn-Titan CR3BP transfers, showing significant improvements in feasibility and Pareto-front coverage compared with prior methods and adjoint-control techniques. The combination of a baseline diffusion model, MCMC-driven data generation, and reward-guided training enables rapid, scalable generation of high-quality, Pareto-optimal costate initializations for indirect trajectory optimization, with practical implications for early mission design and global search in complex dynamical environments.
Abstract
Global search and optimization of long-duration, low-thrust spacecraft trajectories with the indirect method is challenging due to a complex solution space and the difficulty of generating good initial guesses for the costate variables. This is particularly true in multibody environments. Given data that reveals a partial Pareto optimal front, it is desirable to find a flexible manner in which the Pareto front can be completed and fronts for related trajectory problems can be found. In this work we use conditional diffusion models to represent the distribution of candidate optimal trajectory solutions. We then introduce into this framework the novel approach of using Markov Chain Monte Carlo algorithms with self-supervised fine-tuning to achieve the aforementioned goals. Specifically, a random walk Metropolis algorithm is employed to propose new data that can be used to fine-tune the diffusion model using a reward-weighted training based on efficient evaluations of constraint violations and missions objective functions. The framework removes the need for separate focused and often tedious data generation phases. Numerical experiments are presented for two problems demonstrating the ability to improve sample quality and explicitly target Pareto optimality based on the theory of Markov chains. The first problem does so for a transfer in the Jupiter-Europa circular restricted three-body problem, where the MCMC approach completes a partial Pareto front. The second problem demonstrates how a dense and superior Pareto front can be generated by the MCMC self-supervised fine-tuning method for a Saturn-Titan transfer starting from the Jupiter-Europa case versus a separate dedicated global search.
