Table of Contents
Fetching ...

Adaptive Guidance with Reinforcement Meta-Learning

Brian Gaudet, Richard Linares

TL;DR

Adaptive spacecraft guidance under time-varying, imperfectly modeled dynamics is addressed by reinforcement meta-learning with a recurrent policy and value function trained with PPO. The approach enables online adaptation to unobserved forces and mass changes, demonstrated across asteroid and Mars landing tasks including radar-altimeter navigation and engine-failure scenarios. Compared with DR/DV and non-recurrent RL baselines, recurrent policies consistently achieve safer landings and better fuel efficiency, especially as dynamics vary or become highly uncertain. The results support real-time adaptive guidance with integrated navigation, offering robustness for future deep-space missions.

Abstract

This paper proposes a novel adaptive guidance system developed using reinforcement meta-learning with a recurrent policy and value function approximator. The use of recurrent network layers allows the deployed policy to adapt real time to environmental forces acting on the agent. We compare the performance of the DR/DV guidance law, an RL agent with a non-recurrent policy, and an RL agent with a recurrent policy in four difficult tasks with unknown but highly variable dynamics. These tasks include a safe Mars landing with random engine failure and a landing on an asteroid with unknown environmental dynamics. We also demonstrate the ability of a recurrent policy to navigate using only Doppler radar altimeter returns, thus integrating guidance and navigation.

Adaptive Guidance with Reinforcement Meta-Learning

TL;DR

Adaptive spacecraft guidance under time-varying, imperfectly modeled dynamics is addressed by reinforcement meta-learning with a recurrent policy and value function trained with PPO. The approach enables online adaptation to unobserved forces and mass changes, demonstrated across asteroid and Mars landing tasks including radar-altimeter navigation and engine-failure scenarios. Compared with DR/DV and non-recurrent RL baselines, recurrent policies consistently achieve safer landings and better fuel efficiency, especially as dynamics vary or become highly uncertain. The results support real-time adaptive guidance with integrated navigation, offering robustness for future deep-space missions.

Abstract

This paper proposes a novel adaptive guidance system developed using reinforcement meta-learning with a recurrent policy and value function approximator. The use of recurrent network layers allows the deployed policy to adapt real time to environmental forces acting on the agent. We compare the performance of the DR/DV guidance law, an RL agent with a non-recurrent policy, and an RL agent with a recurrent policy in four difficult tasks with unknown but highly variable dynamics. These tasks include a safe Mars landing with random engine failure and a landing on an asteroid with unknown environmental dynamics. We also demonstrate the ability of a recurrent policy to navigate using only Doppler radar altimeter returns, thus integrating guidance and navigation.

Paper Structure

This paper contains 13 sections, 17 equations, 7 figures, 13 tables.

Figures (7)

  • Figure 1: Unrolling the forward pass in Time
  • Figure 2: Agent-Environment Interface
  • Figure 3: Experiment 1: Typical Trajectory for Asteroid Landing
  • Figure 4: Experiment 1: Learning Curves
  • Figure 5: Experiment 2: Digital Terrain Map
  • ...and 2 more figures