Table of Contents
Fetching ...

Learning-based Nonlinear Model Predictive Control of Articulated Soft Robots using Recurrent Neural Networks

Hendrik Schäfke, Tim-Lukas Habich, Christian Muhmann, Simon F. G. Ehlers, Thomas Seel, Moritz Schappler

TL;DR

RNNs based on gated recurrent units (GRUs) are compared to the more commonly used long short-term memory networks and show better accuracy and the proposed learning-based NMPC enables trajectory tracking with an average error of 1.2 in experiments with the pneumatic five-DoF ASR.

Abstract

Soft robots pose difficulties in terms of control, requiring novel strategies to effectively manipulate their compliant structures. Model-based approaches face challenges due to the high dimensionality and nonlinearities such as hysteresis effects. In contrast, learning-based approaches provide nonlinear models of different soft robots based only on measured data. In this paper, recurrent neural networks (RNNs) predict the behavior of an articulated soft robot (ASR) with five degrees of freedom (DoF). RNNs based on gated recurrent units (GRUs) are compared to the more commonly used long short-term memory (LSTM) networks and show better accuracy. The recurrence enables the capture of hysteresis effects that are inherent in soft robots due to viscoelasticity or friction but cannot be captured by simple feedforward networks. The data-driven model is used within a nonlinear model predictive control (NMPC), whereby the correct handling of the RNN's hidden states is focused. A training approach is presented that allows measured values to be utilized in each control cycle. This enables accurate predictions of short horizons based on sensor data, which is crucial for closed-loop NMPC. The proposed learning-based NMPC enables trajectory tracking with an average error of 1.2deg in experiments with the pneumatic five-DoF ASR.

Learning-based Nonlinear Model Predictive Control of Articulated Soft Robots using Recurrent Neural Networks

TL;DR

RNNs based on gated recurrent units (GRUs) are compared to the more commonly used long short-term memory networks and show better accuracy and the proposed learning-based NMPC enables trajectory tracking with an average error of 1.2 in experiments with the pneumatic five-DoF ASR.

Abstract

Soft robots pose difficulties in terms of control, requiring novel strategies to effectively manipulate their compliant structures. Model-based approaches face challenges due to the high dimensionality and nonlinearities such as hysteresis effects. In contrast, learning-based approaches provide nonlinear models of different soft robots based only on measured data. In this paper, recurrent neural networks (RNNs) predict the behavior of an articulated soft robot (ASR) with five degrees of freedom (DoF). RNNs based on gated recurrent units (GRUs) are compared to the more commonly used long short-term memory (LSTM) networks and show better accuracy. The recurrence enables the capture of hysteresis effects that are inherent in soft robots due to viscoelasticity or friction but cannot be captured by simple feedforward networks. The data-driven model is used within a nonlinear model predictive control (NMPC), whereby the correct handling of the RNN's hidden states is focused. A training approach is presented that allows measured values to be utilized in each control cycle. This enables accurate predictions of short horizons based on sensor data, which is crucial for closed-loop NMPC. The proposed learning-based NMPC enables trajectory tracking with an average error of 1.2deg in experiments with the pneumatic five-DoF ASR.

Paper Structure

This paper contains 17 sections, 2 equations, 8 figures, 1 algorithm.

Figures (8)

  • Figure 1: Learning-based NMPC of a five-DoF ASR. The dynamic behavior is learned with recurrent neural networks and used as a dynamic constraint.
  • Figure 2: Soft-robot platform with $n=5$ discrete joints.
  • Figure 3: HPO results for the GRUs: Each line represents a trial (combination of $n_\mathrm{HD}$, $n_\mathrm{HL}$, $n_\mathrm{b}$, $r_\mathrm{d}$ and $\eta_\mathrm{init}$). Poorly performing trials are shown in gray, the best twenty in blue, and the best one in green. A baseline configuration is highlighted in orange. The validation loss $\mathcal{L}_\mathrm{v}$ is considerably reduced by systematically determining the optimum hyperparameters.
  • Figure 4: Block diagram of the learning-based NMPC: An RNN is used as a dynamic constraint to calculate the optimized inputs $\boldsymbol{u}$ given the desired state sequence $\boldsymbol{x}_\mathrm{des}$ for the whole prediction horizon. The green network calculates the hidden stateshidden_state_footnote $\boldsymbol{h}$ in each time step, which are passed to the NMPC after a unit delay of $z^{-1}$ together with the current states $\boldsymbol{x}$. To prevent confusion, the index for the control time step is omitted, which is not equal to the time step $k$ within the prediction horizon (\ref{['eq:optimization_problem']}).
  • Figure 5: (a) Prediction on test data with root-mean-square error (RMSE) $e_i$. RNNs receive measurements to initialize the hidden states (gray area). They then predict the further course solely with their outputs and given inputs. In contrast to GRU$_{\mathrm{full}}$, which includes the position and velocity as a state, GRU$_\mathrm{pos}$ and LSTM$_\mathrm{pos}$ only uses the position. (b) Performance within short (0.8s) prediction horizon. Networks receive measured states at $t{=}100s$ and must predict the future course recursively, which simulates the use within MPC. Hidden states of GRU$_\mathrm{zero}$ are naively initialized with zeros. GRU$_\mathrm{pos}$ and GRU$_\mathrm{const}$ receive initialized hidden states, which are available due to the past predictions. Hidden states are kept constant with GRU$_\mathrm{const}$, which still results in high accuracy within horizon. GRU$_\mathrm{ref}$ represents conventionally trained network, and results in larger deviations despite initialized hidden states.
  • ...and 3 more figures