Table of Contents
Fetching ...

Autonomous Reinforcement Learning Robot Control with Intel's Loihi 2 Neuromorphic Hardware

Kenneth Stewart, Roxana Leontie, Samantha Chapin, Joe Hays, Sumit Bam Shrestha, Carl Glen Henshaw

TL;DR

Power-constrained robotics motivates neuromorphic approaches. The authors train an actor-critic PPO policy in simulation and convert the resulting ANN into a Sigma-Delta Neural Network suitable for Loihi 2, enabling low-power, low-latency inference in a high-fidelity Astrobee simulation. Compared to a GPU-based ANN, the Loihi 2 SDNN delivers substantially lower energy use and higher throughput, with some degradation in tracking accuracy due to spike encoding and quantization. The work demonstrates a practical pathway for deploying energy-efficient neuromorphic controllers for space and terrestrial robotics, highlighting both the potential and the remaining challenges in scaling to larger policies.

Abstract

We present an end-to-end pipeline for deploying reinforcement learning (RL) trained Artificial Neural Networks (ANNs) on neuromorphic hardware by converting them into spiking Sigma-Delta Neural Networks (SDNNs). We demonstrate that an ANN policy trained entirely in simulation can be transformed into an SDNN compatible with Intel's Loihi 2 architecture, enabling low-latency and energy-efficient inference. As a test case, we use an RL policy for controlling the Astrobee free-flying robot, similar to a previously hardware in space-validated controller. The policy, trained with Rectified Linear Units (ReLUs), is converted to an SDNN and deployed on Intel's Loihi 2, then evaluated in NVIDIA's Omniverse Isaac Lab simulation environment for closed-loop control of Astrobee's motion. We compare execution performance between GPU and Loihi 2. The results highlight the feasibility of using neuromorphic platforms for robotic control and establish a pathway toward energy-efficient, real-time neuromorphic computation in future space and terrestrial robotics applications.

Autonomous Reinforcement Learning Robot Control with Intel's Loihi 2 Neuromorphic Hardware

TL;DR

Power-constrained robotics motivates neuromorphic approaches. The authors train an actor-critic PPO policy in simulation and convert the resulting ANN into a Sigma-Delta Neural Network suitable for Loihi 2, enabling low-power, low-latency inference in a high-fidelity Astrobee simulation. Compared to a GPU-based ANN, the Loihi 2 SDNN delivers substantially lower energy use and higher throughput, with some degradation in tracking accuracy due to spike encoding and quantization. The work demonstrates a practical pathway for deploying energy-efficient neuromorphic controllers for space and terrestrial robotics, highlighting both the potential and the remaining challenges in scaling to larger policies.

Abstract

We present an end-to-end pipeline for deploying reinforcement learning (RL) trained Artificial Neural Networks (ANNs) on neuromorphic hardware by converting them into spiking Sigma-Delta Neural Networks (SDNNs). We demonstrate that an ANN policy trained entirely in simulation can be transformed into an SDNN compatible with Intel's Loihi 2 architecture, enabling low-latency and energy-efficient inference. As a test case, we use an RL policy for controlling the Astrobee free-flying robot, similar to a previously hardware in space-validated controller. The policy, trained with Rectified Linear Units (ReLUs), is converted to an SDNN and deployed on Intel's Loihi 2, then evaluated in NVIDIA's Omniverse Isaac Lab simulation environment for closed-loop control of Astrobee's motion. We compare execution performance between GPU and Loihi 2. The results highlight the feasibility of using neuromorphic platforms for robotic control and establish a pathway toward energy-efficient, real-time neuromorphic computation in future space and terrestrial robotics applications.

Paper Structure

This paper contains 6 sections, 5 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Averaged position tracking error, in meters (m), for data sets of 10 random and 10 undock (0.5-meter X-axis) movements comparing the performance of the GPU (blue and orange lines) and Loihi 2 (light grey and dark grey). The mean is represented by the solid lines with the +- 95% confidence interval (CI) shaded in the respective colors.
  • Figure 2: Averaged orientation tracking error, in degrees (deg), for data sets of 10 random and 10 undock (0.5-meter X-axis) movements comparing the performance of the GPU (blue and orange lines) and Loihi 2 (light grey and dark grey). The mean is represented by the solid lines with the +- 95% confidence interval (CI) shaded in the respective colors.
  • Figure 3: Plotting the throughput per inference, in seconds, vs. energy per inference, in joules (J), for the 10 random and 10 undock (0.5-meter X-axis) movements comparing the performance of the GPU (blue and orange lines) and Loihi 2 (light grey and dark grey).
  • Figure 4: (a) ANN GPU inference (b) SDNN Loihi inference. If the Astrobee robot and the goal position and orientation arrows overlap the goal is precisely reached. This shows an example output of the undock task.