Multi-Phase Spacecraft Trajectory Optimization via Transformer-Based Reinforcement Learning
Amit Jain, Victor Rodriguez-Fernandez, Richard Linares
TL;DR
The paper tackles the challenge of cross-phase autonomous spacecraft guidance by introducing a transformer-based reinforcement learning framework that unifies control across mission phases with a single adaptive policy. Using a Gated Transformer-XL memory module integrated with Proximal Policy Optimization, the approach preserves extended temporal context and eliminates explicit phase switching, handling discontinuities from staging and atmospheric-vacuum transitions. Validation spans the double integrator and Van der Pol oscillator benchmarks, multiphase waypoint tasks, and a four-phase rocket ascent, achieving near-optimal performance in simple cases and close alignment with pseudospectral optima in the complex ascent scenario. This demonstrates a scalable, phase-agnostic strategy for autonomous mission planning, potentially reducing reliance on phase-specific controllers while preserving safety and verification compatibility.
Abstract
Autonomous spacecraft control for mission phases such as launch, ascent, stage separation, and orbit insertion remains a critical challenge due to the need for adaptive policies that generalize across dynamically distinct regimes. While reinforcement learning (RL) has shown promise in individual astrodynamics tasks, existing approaches often require separate policies for distinct mission phases, limiting adaptability and increasing operational complexity. This work introduces a transformer-based RL framework that unifies multi-phase trajectory optimization through a single policy architecture, leveraging the transformer's inherent capacity to model extended temporal contexts. Building on proximal policy optimization (PPO), our framework replaces conventional recurrent networks with a transformer encoder-decoder structure, enabling the agent to maintain coherent memory across mission phases spanning seconds to minutes during critical operations. By integrating a Gated Transformer-XL (GTrXL) architecture, the framework eliminates manual phase transitions while maintaining stability in control decisions. We validate our approach progressively: first demonstrating near-optimal performance on single-phase benchmarks (double integrator and Van der Pol oscillator), then extending to multiphase waypoint navigation variants, and finally tackling a complex multiphase rocket ascent problem that includes atmospheric flight, stage separation, and vacuum operations. Results demonstrate that the transformer-based framework not only matches analytical solutions in simple cases but also effectively learns coherent control policies across dynamically distinct regimes, establishing a foundation for scalable autonomous mission planning that reduces reliance on phase-specific controllers while maintaining compatibility with safety-critical verification protocols.
