Table of Contents
Fetching ...

A Q-learning Control Method for a Soft Robotic Arm Utilizing Training Data from a Rough Simulator

Peijin Li, Gaotian Wang, Hao Jiang, Yusong Jin, Yinghao Gan, Xiaoping Chen, Jianmin Ji

TL;DR

A Q-learning controller for a physical soft robot is proposed, in which pre-trained models using data from a rough simulator are applied to improve the performance of the controller.

Abstract

It is challenging to control a soft robot, where reinforcement learning methods have been applied with promising results. However, due to the poor sample efficiency, reinforcement learning methods require a large collection of training data, which limits their applications. In this paper, we propose a Q-learning controller for a physical soft robot, in which pre-trained models using data from a rough simulator are applied to improve the performance of the controller. We implement the method on our soft robot, i.e., Honeycomb Pneumatic Network (HPN) arm. The experiments show that the usage of pre-trained models can not only reduce the amount of the real-world training data, but also greatly improve its accuracy and convergence rate.

A Q-learning Control Method for a Soft Robotic Arm Utilizing Training Data from a Rough Simulator

TL;DR

A Q-learning controller for a physical soft robot is proposed, in which pre-trained models using data from a rough simulator are applied to improve the performance of the controller.

Abstract

It is challenging to control a soft robot, where reinforcement learning methods have been applied with promising results. However, due to the poor sample efficiency, reinforcement learning methods require a large collection of training data, which limits their applications. In this paper, we propose a Q-learning controller for a physical soft robot, in which pre-trained models using data from a rough simulator are applied to improve the performance of the controller. We implement the method on our soft robot, i.e., Honeycomb Pneumatic Network (HPN) arm. The experiments show that the usage of pre-trained models can not only reduce the amount of the real-world training data, but also greatly improve its accuracy and convergence rate.

Paper Structure

This paper contains 12 sections, 2 equations, 6 figures.

Figures (6)

  • Figure 1: The HPN arm with four segments. There are markers on the base and the tip of the arm so that the pose of the tip with respect to the base can be obtained by the motion capture system (MCS, Prime 17W, OptiTrack). Each segment is composed of a deformable Honeycomb structure and four groups of airbags, which can be pressurized independently. By pressurizing the four groups of airbags with different pressure, the HPN structure can deform differently.
  • Figure 2: The illustration of the parameters of the configuration space. The red curve represents the central axis of a single segment of the HPN arm. We establish a coordinate system at the centroid of the base of the arm $O$, let the positive direction of the $z$-axis be the tangent direction of the central axis at the base of the arm, $O'$ is the center of the arm's curvature, $O O'$ is the radius, which is equal to the reciprocal number of the curvature $\frac{1}{K}$, and $L$ is the arc length of the arm, $\varphi$ is the angle between the plane of the arm and the $x$-$z$ plane.
  • Figure 3: The arrangement of the airbags of the arm. The four prisms represent four segments of the arm. We take the first segment as an example. $P_1$, $P_2$, $P_3$, and $P_4$ are the pressure of four groups of airbags in the direction shown in the figure. Vector $\mathbf{e_1}$ and vector $\mathbf{e_2}$ are the unit vectors whose angle with the positive direction of the $x$-axis are $45^{\circ}$.
  • Figure 4: The illustration of the state definition. (a) The absolute pose of the goal which is represented by five parameters: $d_{goal}$, $\theta_{dgoal}$,$\varphi_{dgoal}$, $\theta_{egoal}$, $\varphi_{egoal}$. The first three parameters are the radius, azimuthal angle, and elevation angle of the goal, respectively. The last two are the azimuthal angle and the elevation angle of the orientation of the goal. (b) The pose of the tip with respect to the goal which is represented with five parameters: $d_{tip}$, $\theta_{dtip}$, $\varphi_{dtip}$, $\theta_{etip}$, $\varphi_{etip}$. The first three parameters are the radius, azimuthal angle, and elevation angle of the displacement vector of the tip with respect to the goal. The last two are the azimuthal angle and the elevation angle of the orientation of the tip with respect to the goal.
  • Figure 5: The result of the point-to-point experiment. The experiments of four goals are shown in the figure. (a) and (b) show the results of the two goals where the arm mainly needs to elongate, (c) and (d) show the results of the two goals where the arm mainly needs to bend. The red dashed curves show the results of the controller without pre-training, and the blue solid curves show the results of the controller with pre-training.
  • ...and 1 more figures