Table of Contents
Fetching ...

Multi-Phase Spacecraft Trajectory Optimization via Transformer-Based Reinforcement Learning

Amit Jain, Victor Rodriguez-Fernandez, Richard Linares

TL;DR

The paper tackles the challenge of cross-phase autonomous spacecraft guidance by introducing a transformer-based reinforcement learning framework that unifies control across mission phases with a single adaptive policy. Using a Gated Transformer-XL memory module integrated with Proximal Policy Optimization, the approach preserves extended temporal context and eliminates explicit phase switching, handling discontinuities from staging and atmospheric-vacuum transitions. Validation spans the double integrator and Van der Pol oscillator benchmarks, multiphase waypoint tasks, and a four-phase rocket ascent, achieving near-optimal performance in simple cases and close alignment with pseudospectral optima in the complex ascent scenario. This demonstrates a scalable, phase-agnostic strategy for autonomous mission planning, potentially reducing reliance on phase-specific controllers while preserving safety and verification compatibility.

Abstract

Autonomous spacecraft control for mission phases such as launch, ascent, stage separation, and orbit insertion remains a critical challenge due to the need for adaptive policies that generalize across dynamically distinct regimes. While reinforcement learning (RL) has shown promise in individual astrodynamics tasks, existing approaches often require separate policies for distinct mission phases, limiting adaptability and increasing operational complexity. This work introduces a transformer-based RL framework that unifies multi-phase trajectory optimization through a single policy architecture, leveraging the transformer's inherent capacity to model extended temporal contexts. Building on proximal policy optimization (PPO), our framework replaces conventional recurrent networks with a transformer encoder-decoder structure, enabling the agent to maintain coherent memory across mission phases spanning seconds to minutes during critical operations. By integrating a Gated Transformer-XL (GTrXL) architecture, the framework eliminates manual phase transitions while maintaining stability in control decisions. We validate our approach progressively: first demonstrating near-optimal performance on single-phase benchmarks (double integrator and Van der Pol oscillator), then extending to multiphase waypoint navigation variants, and finally tackling a complex multiphase rocket ascent problem that includes atmospheric flight, stage separation, and vacuum operations. Results demonstrate that the transformer-based framework not only matches analytical solutions in simple cases but also effectively learns coherent control policies across dynamically distinct regimes, establishing a foundation for scalable autonomous mission planning that reduces reliance on phase-specific controllers while maintaining compatibility with safety-critical verification protocols.

Multi-Phase Spacecraft Trajectory Optimization via Transformer-Based Reinforcement Learning

TL;DR

The paper tackles the challenge of cross-phase autonomous spacecraft guidance by introducing a transformer-based reinforcement learning framework that unifies control across mission phases with a single adaptive policy. Using a Gated Transformer-XL memory module integrated with Proximal Policy Optimization, the approach preserves extended temporal context and eliminates explicit phase switching, handling discontinuities from staging and atmospheric-vacuum transitions. Validation spans the double integrator and Van der Pol oscillator benchmarks, multiphase waypoint tasks, and a four-phase rocket ascent, achieving near-optimal performance in simple cases and close alignment with pseudospectral optima in the complex ascent scenario. This demonstrates a scalable, phase-agnostic strategy for autonomous mission planning, potentially reducing reliance on phase-specific controllers while preserving safety and verification compatibility.

Abstract

Autonomous spacecraft control for mission phases such as launch, ascent, stage separation, and orbit insertion remains a critical challenge due to the need for adaptive policies that generalize across dynamically distinct regimes. While reinforcement learning (RL) has shown promise in individual astrodynamics tasks, existing approaches often require separate policies for distinct mission phases, limiting adaptability and increasing operational complexity. This work introduces a transformer-based RL framework that unifies multi-phase trajectory optimization through a single policy architecture, leveraging the transformer's inherent capacity to model extended temporal contexts. Building on proximal policy optimization (PPO), our framework replaces conventional recurrent networks with a transformer encoder-decoder structure, enabling the agent to maintain coherent memory across mission phases spanning seconds to minutes during critical operations. By integrating a Gated Transformer-XL (GTrXL) architecture, the framework eliminates manual phase transitions while maintaining stability in control decisions. We validate our approach progressively: first demonstrating near-optimal performance on single-phase benchmarks (double integrator and Van der Pol oscillator), then extending to multiphase waypoint navigation variants, and finally tackling a complex multiphase rocket ascent problem that includes atmospheric flight, stage separation, and vacuum operations. Results demonstrate that the transformer-based framework not only matches analytical solutions in simple cases but also effectively learns coherent control policies across dynamically distinct regimes, establishing a foundation for scalable autonomous mission planning that reduces reliance on phase-specific controllers while maintaining compatibility with safety-critical verification protocols.

Paper Structure

This paper contains 15 sections, 18 equations, 12 figures, 4 tables, 1 algorithm.

Figures (12)

  • Figure 1: Comparison of traditional segmented spacecraft control approaches (left) versus our proposed transformer-based reinforcement learning framework (right).
  • Figure 2: Transformer-based reinforcement learning architecture for multiphase spacecraft control. The Gated Transformer-XL maintains episodic memory across mission phases, enabling seamless adaptation without explicit phase switching.
  • Figure 3: Training metrics for the RL policy on the double integrator system.
  • Figure 4: Comparison of RL policy (red) and analytical LQR solution (blue) trajectories for the double integrator system.
  • Figure 5: Cost comparison between the RL policy and analytical LQR solution across 10 random test cases.
  • ...and 7 more figures