Hitting the Gym: Reinforcement Learning Control of Exercise-Strengthened Biohybrid Robots in Simulation

Saul Schaffer; Hima Hrithik Pamu; Victoria A. Webster-Wood

Hitting the Gym: Reinforcement Learning Control of Exercise-Strengthened Biohybrid Robots in Simulation

Saul Schaffer, Hima Hrithik Pamu, Victoria A. Webster-Wood

TL;DR

This work tackles the challenge of designing and controlling biohybrid robots with use-history–dependent muscle actuation. It integrates a muscle-adaptation model into a PyElastica-based lattice-worm simulator and applies Proximal Policy Optimization (PPO) to coordinate 42 distributed muscles toward eight targets, treating adaptation as both a controller and a co-design signal. The results show that adaptive agents achieve higher maximum rewards and faster convergence than non-adaptive ones, and RL reveals which muscles are essential for task performance, enabling informed fabrication decisions. This approach advances operational biohybrid robotics by unifying mechanics, control, and adaptive actuation to support scalable, actuated, and design-space-aware systems.

Abstract

Animals can accomplish many incredible behavioral feats across a wide range of operational environments and scales that current robots struggle to match. One explanation for this performance gap is the extraordinary properties of the biological materials that comprise animals, such as muscle tissue. Using living muscle tissue as an actuator can endow robotic systems with highly desirable properties such as self-healing, compliance, and biocompatibility. Unlike traditional soft robotic actuators, living muscle biohybrid actuators exhibit unique adaptability, growing stronger with use. The dependency of a muscle's force output on its use history endows muscular organisms the ability to dynamically adapt to their environment, getting better at tasks over time. While muscle adaptability is a benefit to muscular organisms, it currently presents a challenge for biohybrid researchers: how does one design and control a robot whose actuators' force output changes over time? Here, we incorporate muscle adaptability into a many-muscle biohybrid robot design and modeling tool, leveraging reinforcement learning as both a co-design partner and system controller. As a controller, our learning agents coordinated the independent contraction of 42 muscles distributed on a lattice worm structure to successfully steer it towards eight distinct targets while incorporating muscle adaptability. As a co-design tool, our agents enable users to identify which muscles are important to accomplishing a given task. Our results show that adaptive agents outperform non-adaptive agents in terms of maximum rewards and training time. Together, these contributions can both enable the elucidation of muscle actuator adaptation and inform the design and modeling of adaptive, performant, many-muscle robots.

Hitting the Gym: Reinforcement Learning Control of Exercise-Strengthened Biohybrid Robots in Simulation

TL;DR

Abstract

Paper Structure (13 sections, 3 equations, 6 figures, 1 table)

This paper contains 13 sections, 3 equations, 6 figures, 1 table.

Introduction
Methods
Reinforcement Learning Model Setup
PyElastica Simulation Environment
Adaptive Muscle Force Output
Experiments
Results and Discussion
Comparing Adaptable and Non-adaptable Agent Performance
Adaptation Across Episodes
Identifying Required Muscles From Activation Intensity
Current Adaptation Implementation Limitations
Conclusion
Acknowledgements

Figures (6)

Figure 1: Reinforcement learning control policy coordinates 42 distributed muscles and achieves successful lattice worm reaching. A lattice worm, shown from three different views, starts in an undeformed state in the top panels and is deformed by muscle actuation in the bottom panels. a) Front view of lattice worm in x-z plane. b) Side view of lattice worm in y-z plane. c) Top view of lattice worm in y-x plane. For each view, the top panel shows the state of the lattice worm in the initial state at $t=t_{0}$, and the bottom panel shows the lattice worm in the final state at $t=t_{f}$.
Figure 2: Lattice worm robot and target locations. The robot is discretized, with each element represented as a sphere. Purple represents the structural rods. Red represents the muscle rods. Black represents the connection points between rods. Green numbered dots are the locations of eight target locations.
Figure 3: Training results for lattice worm reaching for eight distinct corners. Each plot features the training results for both the adaptive and non-adaptive muscle cases for 8,500 episodes. Curves are the rolling 50 sample average of the combined results across 5 seeds, with the shaded regions representing one standard deviation. Episodes resulting in simulation instability were excluded. The legend in Corner 8 is for all corners.
Figure 4: Reward maximum averaged across seeds for each of the eight target corners. Bars are the mean, and whiskers are one standard deviation.
Figure 5: Representative muscle adaptation across learning episodes. For each of the 42 muscles patterned on the lattice worm, the blue points are dense enough to form a line and represent the force ceiling for that muscle in a given episode. The red points represent the force a given muscle produced during a given episode. Force increases result from exercise in the previous episodes quantified through strain and use of the muscle. For muscles that are used in previous episodes, the force ceiling increases. Data from adaptive lattice worm reaching for corner 4, seed=106.
...and 1 more figures

Hitting the Gym: Reinforcement Learning Control of Exercise-Strengthened Biohybrid Robots in Simulation

TL;DR

Abstract

Hitting the Gym: Reinforcement Learning Control of Exercise-Strengthened Biohybrid Robots in Simulation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)