Table of Contents
Fetching ...

Autonomous Planning In-space Assembly Reinforcement-learning free-flYer (APIARY) International Space Station Astrobee Testing

Samantha Chapin, Kenneth Stewart, Roxana Leontie, Carl Glen Henshaw

TL;DR

The paper addresses the challenge of autonomous control for free-flying space robots by demonstrating a reinforcement-learning policy trained in NVIDIA Omniverse Isaac Lab to operate NASA's Astrobee on the ISS. Using 6-DOF control with PPO, the policy is trained with randomized goals and mass variations to bridge the sim-to-real gap and validated across Omniverse simulations, Gazebo-based simulations, Granite Lab hardware, and actual ISS flight. The results show the RL approach can perform basic maneuvers in zero-G, with safety mechanisms allowing fallback to a baseline controller, and document both performance gaps and robust behavior. This work demonstrates the feasibility of RL-driven autonomy for space robotics, outlines a rapid, parallel-simulation-driven development pathway, and highlights future directions toward more complex tasks and ISAM-oriented AI&T workflows.

Abstract

The US Naval Research Laboratory's (NRL's) Autonomous Planning In-space Assembly Reinforcement-learning free-flYer (APIARY) experiment pioneers the use of reinforcement learning (RL) for control of free-flying robots in the zero-gravity (zero-G) environment of space. On Tuesday, May 27th 2025 the APIARY team conducted the first ever, to our knowledge, RL control of a free-flyer in space using the NASA Astrobee robot on-board the International Space Station (ISS). A robust 6-degrees of freedom (DOF) control policy was trained using an actor-critic Proximal Policy Optimization (PPO) network within the NVIDIA Isaac Lab simulation environment, randomizing over goal poses and mass distributions to enhance robustness. This paper details the simulation testing, ground testing, and flight validation of this experiment. This on-orbit demonstration validates the transformative potential of RL for improving robotic autonomy, enabling rapid development and deployment (in minutes to hours) of tailored behaviors for space exploration, logistics, and real-time mission needs.

Autonomous Planning In-space Assembly Reinforcement-learning free-flYer (APIARY) International Space Station Astrobee Testing

TL;DR

The paper addresses the challenge of autonomous control for free-flying space robots by demonstrating a reinforcement-learning policy trained in NVIDIA Omniverse Isaac Lab to operate NASA's Astrobee on the ISS. Using 6-DOF control with PPO, the policy is trained with randomized goals and mass variations to bridge the sim-to-real gap and validated across Omniverse simulations, Gazebo-based simulations, Granite Lab hardware, and actual ISS flight. The results show the RL approach can perform basic maneuvers in zero-G, with safety mechanisms allowing fallback to a baseline controller, and document both performance gaps and robust behavior. This work demonstrates the feasibility of RL-driven autonomy for space robotics, outlines a rapid, parallel-simulation-driven development pathway, and highlights future directions toward more complex tasks and ISAM-oriented AI&T workflows.

Abstract

The US Naval Research Laboratory's (NRL's) Autonomous Planning In-space Assembly Reinforcement-learning free-flYer (APIARY) experiment pioneers the use of reinforcement learning (RL) for control of free-flying robots in the zero-gravity (zero-G) environment of space. On Tuesday, May 27th 2025 the APIARY team conducted the first ever, to our knowledge, RL control of a free-flyer in space using the NASA Astrobee robot on-board the International Space Station (ISS). A robust 6-degrees of freedom (DOF) control policy was trained using an actor-critic Proximal Policy Optimization (PPO) network within the NVIDIA Isaac Lab simulation environment, randomizing over goal poses and mass distributions to enhance robustness. This paper details the simulation testing, ground testing, and flight validation of this experiment. This on-orbit demonstration validates the transformative potential of RL for improving robotic autonomy, enabling rapid development and deployment (in minutes to hours) of tailored behaviors for space exploration, logistics, and real-time mission needs.

Paper Structure

This paper contains 14 sections, 7 figures.

Figures (7)

  • Figure 1: Reinforcement learning simulation to ground and flight testing flowchart. Train the Astrobee in zero-G environment, and use the same policy to test in simulation and hardware testing. Iterate by re-training a new policy if necessary based on simulation and hardware testing results.
  • Figure 2: Diagram showing RL training process with Astrobee learning to reach a desired end pose based on observations (the robot's linear and angular velocity and positional and orientation error) and rewards (points for reducing pose error and minimizing velocity), commanding actions (force and torque) resulting in Astrobee motion.
  • Figure 3: Diagram of the control architecture of the Astrobee motion, highlighting replacement of the baseline Astrobee controller with the RL policy, yellow box, when a movement is commanded.
  • Figure 4: Granite Lab 1G Simulation Comparison (a) Astrobee simulator starting position. (b) 3D plot of baseline (red) and RL policy (blue) controllers' motion to undock Astrobee. (c) Position error from goal pose for Astrobee 0.5 X-Axis command using baseline Astrobee controller (solid lines) vs RL policy controller (dotted lines). (d) Orientation error from goal pose for Astrobee 0.5 X-Axis command.
  • Figure 5: ISS Zero-G Simulation Comparison (a) Astrobee simulator starting position, docked. (b) 3D plot of baseline (red) and RL policy (blue) controllers' motion to undock Astrobee. (c) Position error from goal pose for Astrobee undock command using baseline Astrobee controller (solid lines) vs RL policy controller (dotted lines). (d) Orientation error from goal pose for Astrobee undock command.
  • ...and 2 more figures