Table of Contents
Fetching ...

Investigating the Impact of Choice on Deep Reinforcement Learning for Space Controls

Nathaniel Hamilton, Kyle Dunlap, Kerianne L. Hobbs

TL;DR

This study investigates how the choice of action space—discrete versus continuous—and its granularity affect deep reinforcement learning for autonomous space control tasks. Using PPO, the authors compare learning dynamics and performance on two representative missions: inspection around a chief object and docking to a second spacecraft, with fuel efficiency captured by $\Delta V$ as a key objective. They run a comprehensive ablation across action configurations (3–101 discrete choices and varying $u_{\max}$) over multiple seeds and timesteps, reporting IQM-based metrics. The results show that enabling a no-thrust option or reducing action magnitude generally lowers fuel use, but the optimal configuration is task-dependent: three discrete actions with small magnitude excel for inspection, while continuous actions with small $u_{\max}$ best support docking. Overall, there is no single optimal balance between discrete and continuous actions; future work will extend to six-DOF dynamics and scheduled thrust strategies, with implications for practical autonomous space operations.

Abstract

For many space applications, traditional control methods are often used during operation. However, as the number of space assets continues to grow, autonomous operation can enable rapid development of control methods for different space related tasks. One method of developing autonomous control is Reinforcement Learning (RL), which has become increasingly popular after demonstrating promising performance and success across many complex tasks. While it is common for RL agents to learn bounded continuous control values, this may not be realistic or practical for many space tasks that traditionally prefer an on/off approach for control. This paper analyzes using discrete action spaces, where the agent must choose from a predefined list of actions. The experiments explore how the number of choices provided to the agents affects their measured performance during and after training. This analysis is conducted for an inspection task, where the agent must circumnavigate an object to inspect points on its surface, and a docking task, where the agent must move into proximity of another spacecraft and "dock" with a low relative speed. A common objective of both tasks, and most space tasks in general, is to minimize fuel usage, which motivates the agent to regularly choose an action that uses no fuel. Our results show that a limited number of discrete choices leads to optimal performance for the inspection task, while continuous control leads to optimal performance for the docking task.

Investigating the Impact of Choice on Deep Reinforcement Learning for Space Controls

TL;DR

This study investigates how the choice of action space—discrete versus continuous—and its granularity affect deep reinforcement learning for autonomous space control tasks. Using PPO, the authors compare learning dynamics and performance on two representative missions: inspection around a chief object and docking to a second spacecraft, with fuel efficiency captured by as a key objective. They run a comprehensive ablation across action configurations (3–101 discrete choices and varying ) over multiple seeds and timesteps, reporting IQM-based metrics. The results show that enabling a no-thrust option or reducing action magnitude generally lowers fuel use, but the optimal configuration is task-dependent: three discrete actions with small magnitude excel for inspection, while continuous actions with small best support docking. Overall, there is no single optimal balance between discrete and continuous actions; future work will extend to six-DOF dynamics and scheduled thrust strategies, with implications for practical autonomous space operations.

Abstract

For many space applications, traditional control methods are often used during operation. However, as the number of space assets continues to grow, autonomous operation can enable rapid development of control methods for different space related tasks. One method of developing autonomous control is Reinforcement Learning (RL), which has become increasingly popular after demonstrating promising performance and success across many complex tasks. While it is common for RL agents to learn bounded continuous control values, this may not be realistic or practical for many space tasks that traditionally prefer an on/off approach for control. This paper analyzes using discrete action spaces, where the agent must choose from a predefined list of actions. The experiments explore how the number of choices provided to the agents affects their measured performance during and after training. This analysis is conducted for an inspection task, where the agent must circumnavigate an object to inspect points on its surface, and a docking task, where the agent must move into proximity of another spacecraft and "dock" with a low relative speed. A common objective of both tasks, and most space tasks in general, is to minimize fuel usage, which motivates the agent to regularly choose an action that uses no fuel. Our results show that a limited number of discrete choices leads to optimal performance for the inspection task, while continuous control leads to optimal performance for the docking task.
Paper Structure (24 sections, 6 equations, 15 figures, 6 tables)

This paper contains 24 sections, 6 equations, 15 figures, 6 tables.

Figures (15)

  • Figure 1: Deputy spacecraft navigating around a chief spacecraft in Hill's Frame.
  • Figure 2: Comparison of $\Delta V$ (m/s) for all final policies. Each marker represents the IQM of 100 trials, and the shaded region is the $95\%$ confidence interval about the IQM.
  • Figure 3: Comparison of actions taken in each environment. Each histogram shows the percentage of actions used the experiments, with the policies trained with continuous actions divided into 101 discrete intervals for comparison.
  • Figure 4: Comparison of the total reward for the final policies in the inspection environment. Each marker represents the IQM of 100 trials, and the shaded region is the $95\%$ confidence interval about the IQM.
  • Figure 5: Comparison of the success rate for the final policies in the docking environment. Each marker represents the IQM of 100 trials, and the shaded region is the $95\%$ confidence interval about the IQM.
  • ...and 10 more figures