Table of Contents
Fetching ...

Evolution and learning in differentiable robots

Luke Strgar, David Matthews, Tyler Hummer, Sam Kriegman

TL;DR

The paper integrates massively parallel differentiable simulations with evolutionary search to co-optimize robot morphology and neural control, enabling exploration of orders of magnitude more designs than traditional, non-differentiable approaches ($10^{7}$ designs over $1000$ generations). It demonstrates that evolution discovers morphologies that become increasingly differentiable, aiding gradient-based learning, and reports a successful sim2real transfer for a manufactured design. The results highlight how body structure can shape trainability and locomotion across terrains, offering a cyberphysical platform to study evolution-learning interactions in embodied agents. This scalable framework paves the way for discovering complex, trainable morphologies that can be realized in hardware and studied in real-world environments.

Abstract

The automatic design of robots has existed for 30 years but has been constricted by serial non-differentiable design evaluations, premature convergence to simple bodies or clumsy behaviors, and a lack of sim2real transfer to physical machines. Thus, here we employ massively-parallel differentiable simulations to rapidly and simultaneously optimize individual neural control of behavior across a large population of candidate body plans and return a fitness score for each design based on the performance of its fully optimized behavior. Non-differentiable changes to the mechanical structure of each robot in the population -- mutations that rearrange, combine, add, or remove body parts -- were applied by a genetic algorithm in an outer loop of search, generating a continuous flow of novel morphologies with highly-coordinated and graceful behaviors honed by gradient descent. This enabled the exploration of several orders-of-magnitude more designs than all previous methods, despite the fact that robots here have the potential to be much more complex, in terms of number of independent motors, than those in prior studies. We found that evolution reliably produces ``increasingly differentiable'' robots: body plans that smooth the loss landscape in which learning operates and thereby provide better training paths toward performant behaviors. Finally, one of the highly differentiable morphologies discovered in simulation was realized as a physical robot and shown to retain its optimized behavior. This provides a cyberphysical platform to investigate the relationship between evolution and learning in biological systems and broadens our understanding of how a robot's physical structure can influence the ability to train policies for it. Videos and code at https://sites.google.com/view/eldir.

Evolution and learning in differentiable robots

TL;DR

The paper integrates massively parallel differentiable simulations with evolutionary search to co-optimize robot morphology and neural control, enabling exploration of orders of magnitude more designs than traditional, non-differentiable approaches ( designs over generations). It demonstrates that evolution discovers morphologies that become increasingly differentiable, aiding gradient-based learning, and reports a successful sim2real transfer for a manufactured design. The results highlight how body structure can shape trainability and locomotion across terrains, offering a cyberphysical platform to study evolution-learning interactions in embodied agents. This scalable framework paves the way for discovering complex, trainable morphologies that can be realized in hardware and studied in real-world environments.

Abstract

The automatic design of robots has existed for 30 years but has been constricted by serial non-differentiable design evaluations, premature convergence to simple bodies or clumsy behaviors, and a lack of sim2real transfer to physical machines. Thus, here we employ massively-parallel differentiable simulations to rapidly and simultaneously optimize individual neural control of behavior across a large population of candidate body plans and return a fitness score for each design based on the performance of its fully optimized behavior. Non-differentiable changes to the mechanical structure of each robot in the population -- mutations that rearrange, combine, add, or remove body parts -- were applied by a genetic algorithm in an outer loop of search, generating a continuous flow of novel morphologies with highly-coordinated and graceful behaviors honed by gradient descent. This enabled the exploration of several orders-of-magnitude more designs than all previous methods, despite the fact that robots here have the potential to be much more complex, in terms of number of independent motors, than those in prior studies. We found that evolution reliably produces ``increasingly differentiable'' robots: body plans that smooth the loss landscape in which learning operates and thereby provide better training paths toward performant behaviors. Finally, one of the highly differentiable morphologies discovered in simulation was realized as a physical robot and shown to retain its optimized behavior. This provides a cyberphysical platform to investigate the relationship between evolution and learning in biological systems and broadens our understanding of how a robot's physical structure can influence the ability to train policies for it. Videos and code at https://sites.google.com/view/eldir.
Paper Structure (29 sections, 11 figures, 4 tables)

This paper contains 29 sections, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Evolution and learning. An initially random population of 10K progenitor robots (A) produce 10K offspring (B) through random morphological mutations and/or crossover, temporarily doubling the size of population to 20K robots. The 10K offspring are then simultaneously trained in parallel differentiable simulations as follows. Each robot has its own proprioceptive neural network (C) that coordinates the actuation of its motors (D) through differentiable simulation to produce behavior (E), yielding a performance score based on the net displacement of the robot in the desired direction of travel (into the right hand side of the page). Gradients are then propagated backward in time through the simulated behavior (F) and used to update the neural net's initially random synaptic weights such as to improve its performance. This process is repeated for 34 additional iterations of gradient descent (35 total gradient descent steps), after which the robot's fitness was taken to be the best performance score it achieved during training. Finally, selection (G) reduces the population back to 10K robots by deleting the worst performing robots. Evolution then proceeds to the next generation (H) and this process of design variation, parallel differentiable training, and selection is repeated 998 times for a total of 1000 generations of evolution (I). A build filter (J) was then used to identify the most manufacturable designs discovered in simulation; e.g. the simulated design in K which was printed (L) and fitted with shape memory alloy springs that can be repeatedly energized and cooled to generate forward locomotion.
  • Figure 2: A diversity of body shapes were discovered by evolution, each with its own unique internal distribution of active (green) and passive (gray) springs. The optimized performance (displacement in the desired direction of travel) achieved by gradient based learning is denoted above each design in meters (m). On rugged terrains bipedal body plans often emerged (A2, A6, B1, B4, B5, D3, E4, F3, F5) whereas on flat terrain robots often evolved three or four legs (A3, C1, C3, C4). Occasionally, swinging limbs that did not make contact with the ground were used to generate forward momentum (A4, B1, B4, E5). Videos and code can be found at https://sites.google.com/view/eldir.
  • Figure 3: Neural control of behavior. Each robot is controlled by a three-layer fully-connected neural network (A). At every time step, motor neurons output the rest lengths of each active spring in the robot's body (B). The sensory repercussions of these actions are captured by four proprioceptors at each of the robot's masses (concentric circles in C), which feed back into the nervous system (D) alongside central pattern generators (CPGs; E) closing the control loop. More precisely, there are 10 CPGs corresponding to 10 phase shifted sinusoidal waves; four proprioceptive channels track the vertical and horizontal velocity of each mass, as well as the vertical and horizontal displacement of each mass relative to the robot's center of mass, during behavior.
  • Figure 4: Manufacture. Designs discovered in simulation (A) were assembled from 3D printed hexagonal masses (B), active two-way shape memory alloy springs (C: contracted; D: expanded), and passive springs (E). Two dimensional simulated designs were transferred to 3D physical robots by connecting two parallel planes of masses and springs with biplane connectors (F). (G-J:) The resultant robot has two active springs per planar face (four total; green boxes in G) that can be independently energized (lightning bolts in H-J) to deform the robot's body.
  • Figure 5: Increasingly differentiable robots. The initially random behavior (light green) and the learned behavior (dark green) of the most performant design are plotted for five independent evolutionary trials (five pairs of light/dark lines) over flat terrain. Initial robot behavior (at the first iteration of gradient descent) produces little to no forward locomotion, whereas locomotive ability after learning continues to improve over evolutionary time. Each design was evaluated only once but may continue to survive and reproduce over many generations. Synaptic weights were not transmissible from parent to offspring.
  • ...and 6 more figures