Table of Contents
Fetching ...

Investigating the Impact of Observation Space Design Choices On Training Reinforcement Learning Solutions for Spacecraft Problems

Nathaniel Hamilton, Kyle Dunlap, Kerianne L Hobbs

TL;DR

This paper investigates how observation-space design influences reinforcement learning for spacecraft inspection tasks, extending prior work on action-space design. It evaluates two dimensions: sensor-based augmentations to the observation vector and frame definitions (chief-centered Hill's frame vs. agent-centered representations) using PPO in a Hill's-frame-relative dynamics environment. Key findings show that RL can learn to complete inspections without extra sensors, but sensors like Sun Angle and UPS yield more optimal and consistent behavior, while the Count sensor can hinder learning; changing to an agent-centered frame provides only minor, transient advantages. The results guide observation-space design for autonomous space operations and point to future work in more complex six-DOF dynamics and multi-agent scenarios.

Abstract

Recent research using Reinforcement Learning (RL) to learn autonomous control for spacecraft operations has shown great success. However, a recent study showed their performance could be improved by changing the action space, i.e. control outputs, used in the learning environment. This has opened the door for finding more improvements through further changes to the environment. The work in this paper focuses on how changes to the environment's observation space can impact the training and performance of RL agents learning the spacecraft inspection task. The studies are split into two groups. The first looks at the impact of sensors that were designed to help agents learn the task. The second looks at the impact of reference frames, reorienting the agent to see the world from a different perspective. The results show the sensors are not necessary, but most of them help agents learn more optimal behavior, and that the reference frame does not have a large impact, but is best kept consistent.

Investigating the Impact of Observation Space Design Choices On Training Reinforcement Learning Solutions for Spacecraft Problems

TL;DR

This paper investigates how observation-space design influences reinforcement learning for spacecraft inspection tasks, extending prior work on action-space design. It evaluates two dimensions: sensor-based augmentations to the observation vector and frame definitions (chief-centered Hill's frame vs. agent-centered representations) using PPO in a Hill's-frame-relative dynamics environment. Key findings show that RL can learn to complete inspections without extra sensors, but sensors like Sun Angle and UPS yield more optimal and consistent behavior, while the Count sensor can hinder learning; changing to an agent-centered frame provides only minor, transient advantages. The results guide observation-space design for autonomous space operations and point to future work in more complex six-DOF dynamics and multi-agent scenarios.

Abstract

Recent research using Reinforcement Learning (RL) to learn autonomous control for spacecraft operations has shown great success. However, a recent study showed their performance could be improved by changing the action space, i.e. control outputs, used in the learning environment. This has opened the door for finding more improvements through further changes to the environment. The work in this paper focuses on how changes to the environment's observation space can impact the training and performance of RL agents learning the spacecraft inspection task. The studies are split into two groups. The first looks at the impact of sensors that were designed to help agents learn the task. The second looks at the impact of reference frames, reorienting the agent to see the world from a different perspective. The results show the sensors are not necessary, but most of them help agents learn more optimal behavior, and that the reference frame does not have a large impact, but is best kept consistent.
Paper Structure (15 sections, 5 equations, 10 figures, 3 tables)

This paper contains 15 sections, 5 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: DRL training loop from hamilton2023ablation.
  • Figure 2: Deputy spacecraft navigating around a chief spacecraft in Hill's Frame from dunlap2024run.
  • Figure 3: Final policy performance plots showing the expected level of performance of the final trained models measured in total reward and success rate. The line represents the interquartile mean and the shaded region is the 95% confidence interval.
  • Figure 4: Final policy performance plots showing the expected level of performance of the final trained models measured in number of inspected points, fuel usage, and episode length. The line represents the interquartile mean and the shaded region is the 95% confidence interval.
  • Figure 5: Example episodes using policies trained with the No Sensors configuration.
  • ...and 5 more figures