Table of Contents
Fetching ...

Reinforcement Learning Controllers for Soft Robots using Learned Environments

Uljad Berdica, Matthew Jackson, Niccolò Enrico Veronese, Jakob Foerster, Perla Maiolino

TL;DR

A reinforcement learning approach towards closed-loop control through state-of-the-art actor-critic methods, which efficiently learn high-performance behaviour over long horizons, and a safety oriented actuation space exploration protocol via cascaded updates and weighted randomness.

Abstract

Soft robotic manipulators offer operational advantage due to their compliant and deformable structures. However, their inherently nonlinear dynamics presents substantial challenges. Traditional analytical methods often depend on simplifying assumptions, while learning-based techniques can be computationally demanding and limit the control policies to existing data. This paper introduces a novel approach to soft robotic control, leveraging state-of-the-art policy gradient methods within parallelizable synthetic environments learned from data. We also propose a safety oriented actuation space exploration protocol via cascaded updates and weighted randomness. Specifically, our recurrent forward dynamics model is learned by generating a training dataset from a physically safe \textit{mean reverting} random walk in actuation space to explore the partially-observed state-space. We demonstrate a reinforcement learning approach towards closed-loop control through state-of-the-art actor-critic methods, which efficiently learn high-performance behaviour over long horizons. This approach removes the need for any knowledge regarding the robot's operation or capabilities and sets the stage for a comprehensive benchmarking tool in soft robotics control.

Reinforcement Learning Controllers for Soft Robots using Learned Environments

TL;DR

A reinforcement learning approach towards closed-loop control through state-of-the-art actor-critic methods, which efficiently learn high-performance behaviour over long horizons, and a safety oriented actuation space exploration protocol via cascaded updates and weighted randomness.

Abstract

Soft robotic manipulators offer operational advantage due to their compliant and deformable structures. However, their inherently nonlinear dynamics presents substantial challenges. Traditional analytical methods often depend on simplifying assumptions, while learning-based techniques can be computationally demanding and limit the control policies to existing data. This paper introduces a novel approach to soft robotic control, leveraging state-of-the-art policy gradient methods within parallelizable synthetic environments learned from data. We also propose a safety oriented actuation space exploration protocol via cascaded updates and weighted randomness. Specifically, our recurrent forward dynamics model is learned by generating a training dataset from a physically safe \textit{mean reverting} random walk in actuation space to explore the partially-observed state-space. We demonstrate a reinforcement learning approach towards closed-loop control through state-of-the-art actor-critic methods, which efficiently learn high-performance behaviour over long horizons. This approach removes the need for any knowledge regarding the robot's operation or capabilities and sets the stage for a comprehensive benchmarking tool in soft robotics control.

Paper Structure

This paper contains 14 sections, 1 equation, 8 figures, 1 algorithm.

Figures (8)

  • Figure 1: The pipeline of the learned environment-based solution proposed in this work. The recurrent network to the left represents the LSTM at the core of the synthetic environments.
  • Figure 2: Training Pair Generation is shown through matched colors. For illustration purposes we use a sequence length of 512 with step size of 200 on a subset of the data. In practice, we use a step size of 1 and slide through the entire runs.
  • Figure 3: Robot at initial positions (a) no deformation at home position with initial baseline pressure 2kPa, (b) Transverse cross-section view of the root, pressure chambers A to C and reflective markers $O_1$ to $O_3$
  • Figure 4: Random Walk in actuation space for different exploration hyperparameter $\alpha$
  • Figure 5: Resulting trajectories from a random walk in actuation space various levels of randomness
  • ...and 3 more figures