Neural reservoir control of a soft bio-hybrid arm

Noel Naughton; Arman Tekinalp; Keshav Shivam; Seung Hung Kim; Volodymyr Kindratenko; Mattia Gazzola

Neural reservoir control of a soft bio-hybrid arm

Noel Naughton, Arman Tekinalp, Keshav Shivam, Seung Hung Kim, Volodymyr Kindratenko, Mattia Gazzola

TL;DR

By implementing a spiking reservoir on neuromorphic hardware, energy efficiency is achieved, with nearly two-orders of magnitude improvement relative to standard CPUs, with implications for the on-board control of untethered, small-scale soft robots.

Abstract

A long-standing engineering problem, the control of soft robots is difficult because of their highly non-linear, heterogeneous, anisotropic, and distributed nature. Here, bridging engineering and biology, a neural reservoir is employed for the dynamic control of a bio-hybrid model arm made of multiple muscle-tendon groups enveloping an elastic spine. We show how the use of reservoirs facilitates simultaneous control and self-modeling across a set of challenging tasks, outperforming classic neural network approaches. Further, by implementing a spiking reservoir on neuromorphic hardware, energy efficiency is achieved, with nearly two-orders of magnitude improvement relative to standard CPUs, with implications for the on-board control of untethered, small-scale soft robots.

Neural reservoir control of a soft bio-hybrid arm

TL;DR

Abstract

Paper Structure

This paper contains 15 sections, 11 equations, 4 figures.

Acknowledgements
Methods

Figures (4)

Figure 1: Neural reservoir control of a soft bio-hybrid arm. (a) Schematic depicting the coupling of a neural reservoir with a soft bio-hybrid arm made of sixteen muscle-tendon units enveloping an elastic spine to achieve autonomous control. (b) Learning performance for the control of a soft bio-hybrid arm (backbone stiffness: 125 kPa) tracking a 3D moving target: neural reservoir and traditional feedforward and LSTM network architectures. The solid lines depict the average learning performance (average episode return), each obtained by considering five neural architectures instances initialized using different random number generator seeds. The shaded regions depict the spread in learning performance of the five architectures with different initial seeds. (c) Violin plots of average performance of trained policies evaluated over 250 episodes, for different reservoir sizes and reference architectures (FF and LSTM). Inset shows learning performance as reservoir size increases, illustrating how both more rapid and better overall learning performance is achieved as the reservoir size increases. For FF and LSTM, different numbers of neurons and layers / stacks were considered (FF: [64$\times$64], [128$\times$128], [64$\times$64$\times$64], [64$\times$64$\times$64$\times$64]; LSTM: [64$\times$1 stack], [128$\times$1 stack], [256$\times$1 stack], [64$\times$2 stacks], [128$\times$2 stacks], [256$\times$2 stacks]), with no significant differences observed. Here we report data for the best performing networks (SI for full details). Overlaid snapshots of (d) side view and (e) top view of the bio-hybrid arm successfully tracking a 3D target (see also SI Video 1) using a trained neural reservoir control policy. Intensity of muscle color (pale to dark red) denotes a muscle's activation level.
Figure 2: Control of increasingly compliant bio-hybrid arms. (a) Learning performance of a neural reservoir (4096 neurons) compared to LSTM and FF networks for a soft bio-hybrid arm with decreasing backbone elastic modulii. The neural reservoir exhibits a compact performance envelope while the LSTM and FF networks exhibit much wider performance envelopes with control performance drastically decreasing as the backbone softens. Solid and shaded regions denote the performance average and spread relative to five randomly initialized network instances (as in Fig. \ref{['fig:1']}b). Violin plots of the average (b) kinetic and (c) bending energy of the arm's spine, when RC, LSTM, and FF network architectures are used for controlling systems of decreasing backbone stiffness. Values reported are total kinetic/bending energy integrated over the episode and normalized by episode length. The RC network exhibits lower energies in all cases. Bending energy is observed to decrease linearly in the case of RC, in keep with the linear softening of the spine and in contrast with FF and LSTM, which exhibits sublinear scalings. (d) Snapshots of trained policy performance for different backbone stiffness levels illustrating control failure modes. While all three network architectures successfully control a soft arm with a backbone stiffness of 1 MPa, for a stiffness of 250 kPa the feedforward network fails to coordinate muscle contractions, demonstrating an excessive bending failure mode (with an associated increased bending energy compared to RC, see panel (b)). While The LSTM network exhibits better performance, it also produces excessive bending and fails as the backbone stiffness continues to decrease (62.5 kPa). Video comparison of tracking performances for all backbone stiffness cases is available in SI Video 2. As in Fig. \ref{['fig:1']}, for FF and LSTM, different numbers of neurons and layers / stacks were considered, with no significant differences observed across arm's stiffnesses. Here we report data for the best performing networks (SI for full details).
Figure 3: Parallel maps for self-modeling and robust control.(a) Schematic of neural reservoir control equipped with additional, parallel maps to infer/predict state information. All reported results are for an arm with a backbone stiffness of 250 kPa controlled by an already trained (via RL) neural reservoir with 4096 neurons, as described in Fig. \ref{['fig:1']}. (b) Accuracy of parallel map estimates of future target positions, for increasing time-windows into the future. (c) Accuracy of parallel map estimates of future arm's tip position for increasing time-windows into the future. For both (b) and (c), the relative error is defined as $||\mathbf{\hat{y}} - \mathbf{y}||/\ell$ where $|| \cdot ||$ is the L2-vector norm, $\mathbf{\hat{y}}$ is the predicted position, $\mathbf{y}$ is the true position, and $\ell$ is the length of the soft arm. (d) Performance of reservoir self-modeling for estimation of current arm pose (top row). Heat map of accuracy of pose estimation along the length of the arm. Color denotes relative error of estimation of the position of a point $s\in[0,\ell]$ along the soft arm of length $\ell$. Relative error is defined as $||\mathbf{\hat{y}}(s) - \mathbf{y}(s)||/s$ where $\mathbf{\hat{y}}(s)$ is the predicted location of point $s$ and $\mathbf{y}$(s) is the true location of that point. Accuracy is initially lower due to transient startup effects in the reservoir before reaching a consistent level of high accuracy as confirmed (mid/bottom row) by visualization of the estimated (gold) and true (full color) arm pose at selected time instances. (e) Tracking performance of the neural reservoir when the target position becomes unavailable for increasing lengths of time. The target oscillates between being measured ('seen') and inferred ('blind') for equal time intervals that range in length from 0.25 seconds to 3 seconds. Violin plots show performance over 50 trials for increasing periods of time during which the arm is blinded. The blue shaded region shows the baseline performance of the reservoir when the target location is always known. (f) Comparison of inferred target's 3D position for a three-second inference period. After approximately 1.5 seconds, the estimate of the target trajectory sometimes drifts from the true trajectory, leading to the drop in tracking performance seen in panel (e).
Figure 4: Spiking neural reservoir control on neuromorphic hardware(a) Schematic showing how the state is encoded into a spike train (SI) that is sent to a spiking neural reservoir running on an Intel Loihi chip. Spike train outputs are then decoded (SI) into continuous actions (contractions of the arm muscles). (b) Comparison between the performance over 250 episodes of a trained spiking neural reservoir on Loihi (backbone stiffness: 125 kPa; reservoir size: 2048 neurons, which is the largest implementable on Loihi), a neural reservoir running on traditional silicon using non-spiking artificial neurons, and an LSTM network (same as Fig. \ref{['fig:1']} and Fig. \ref{['fig:2']}). Inset shows the training performance for the spiking reservoir, the non-spiking reservoir, and the LSTM. (c) The spiking reservoir exhibits a compact performance envelope as the backbone stiffness decreases. All cases are trained using five random initialization seeds as in Fig. \ref{['fig:1']}b. (d) Energy use of a non-spiking reservoir running on traditional silicon (Intel Xeon W-2665) and the spiking reservoir running on the Intel Loihi chip. The non-spiking reservoir on traditional silicon exhibits quadratic energy scaling as the reservoir size increases compared to the linear energy scaling of the spiking reservoir running on the Intel Loihi chip. (e) Initial 30k episodes of learning performance of neural reservoirs (n=1024) controlling arms of decreasing backbone stiffness tasked with reaching through a cluttered nest of obstacles to a fixed target (SI for expanded results). Video of the performance of final trained policies for all stiffness levels is available in SI Video 6. (f) Timelapse front/side views of a spiking reservoir guiding a soft arm through unstructured obstacles to reach a target (backbone stiffness: 125 kPa).

Neural reservoir control of a soft bio-hybrid arm

TL;DR

Abstract

Neural reservoir control of a soft bio-hybrid arm

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)