Table of Contents
Fetching ...

Accelerated co-design of robots through morphological pretraining

Luke Strgar, Sam Kriegman

TL;DR

This work tackles the long-standing challenge of co-designing robot morphology and control by introducing large-scale morphological pretraining to learn a universal, morphology-agnostic controller via differentiable simulation. The pretrained controller enables zero-shot exploration of non-differentiable body changes and highlights a diversity-collapse issue when evolving morphology with a fixed controller. For mitigation, few-shot evolution with generational finetuning preserves and enhances diversity while achieving high performance, whereas simultaneous co-design from scratch without pretraining suffers from rapid diversity loss and slower progress. The approach dramatically accelerates co-design across massive morphospaces and uncovers important considerations for crossover, diversity maintenance, and potential pathways toward self-reconfigurable robots, with implications for real-world transfer and multi-task expansion.

Abstract

The co-design of robot morphology and neural control typically requires using reinforcement learning to approximate a unique control policy gradient for each body plan, demanding massive amounts of training data to measure the performance of each design. Here we show that a universal, morphology-agnostic controller can be rapidly and directly obtained by gradient-based optimization through differentiable simulation. This process of morphological pretraining allows the designer to explore non-differentiable changes to a robot's physical layout (e.g. adding, removing and recombining discrete body parts) and immediately determine which revisions are beneficial and which are deleterious using the pretrained model. We term this process "zero-shot evolution" and compare it with the simultaneous co-optimization of a universal controller alongside an evolving design population. We find the latter results in diversity collapse, a previously unknown pathology whereby the population -- and thus the controller's training data -- converges to similar designs that are easier to steer with a shared universal controller. We show that zero-shot evolution with a pretrained controller quickly yields a diversity of highly performant designs, and by fine-tuning the pretrained controller on the current population throughout evolution, diversity is not only preserved but significantly increased as superior performance is achieved.

Accelerated co-design of robots through morphological pretraining

TL;DR

This work tackles the long-standing challenge of co-designing robot morphology and control by introducing large-scale morphological pretraining to learn a universal, morphology-agnostic controller via differentiable simulation. The pretrained controller enables zero-shot exploration of non-differentiable body changes and highlights a diversity-collapse issue when evolving morphology with a fixed controller. For mitigation, few-shot evolution with generational finetuning preserves and enhances diversity while achieving high performance, whereas simultaneous co-design from scratch without pretraining suffers from rapid diversity loss and slower progress. The approach dramatically accelerates co-design across massive morphospaces and uncovers important considerations for crossover, diversity maintenance, and potential pathways toward self-reconfigurable robots, with implications for real-world transfer and multi-task expansion.

Abstract

The co-design of robot morphology and neural control typically requires using reinforcement learning to approximate a unique control policy gradient for each body plan, demanding massive amounts of training data to measure the performance of each design. Here we show that a universal, morphology-agnostic controller can be rapidly and directly obtained by gradient-based optimization through differentiable simulation. This process of morphological pretraining allows the designer to explore non-differentiable changes to a robot's physical layout (e.g. adding, removing and recombining discrete body parts) and immediately determine which revisions are beneficial and which are deleterious using the pretrained model. We term this process "zero-shot evolution" and compare it with the simultaneous co-optimization of a universal controller alongside an evolving design population. We find the latter results in diversity collapse, a previously unknown pathology whereby the population -- and thus the controller's training data -- converges to similar designs that are easier to steer with a shared universal controller. We show that zero-shot evolution with a pretrained controller quickly yields a diversity of highly performant designs, and by fine-tuning the pretrained controller on the current population throughout evolution, diversity is not only preserved but significantly increased as superior performance is achieved.

Paper Structure

This paper contains 18 sections, 13 figures.

Figures (13)

  • Figure 1: Universal control of differentiable robots. Large-scale pretraining and finetuning of a universal controller was achieved by averaging simulation gradients across the robot's body, world, and goal. The controller is shared by an arbitrarily large and morphologically diverse population of robots as they undergo morphological evolution. The objective is to find designs that can move quickly across a previously-unseen terrain toward a randomly-positioned light source (glowing white spheres).
  • Figure 2: Overview of the proposed method. End-to-end differentiable policy training across tens of millions of morphologically distinct robots---morphological pretraining---produces a universal controller, which was kept frozen throughout zero-shot evolution and finetuned for each generation of few-shot evolution.
  • Figure 3: Few-shot evolution. A population of 8192 initially random designs (a pair of which are shown in the top row) were randomly recombined and mutated to produce 8192 offspring, temporarily expanding the population to 16384 designs. All designs in the population were driven by the same universal controller, which was rapidly pretrained (before evolution) and finetuned for the current population (at every generation of evolution) using analytical gradients from differentiable simulation. Deleting the worst performing designs and replacing them with the best offspring, and repeating this process for several generations, yields a diversity of increasingly performant designs, and ultimately a final population of 8192 winning designs (bottom row), each with their own unique evolutionary history (phylogeny). An example phylogenetic tree, colored by loss (decreasing from gray to cyan to pink), is shown for one of winning designs.
  • Figure 4: Genotype to phenotype. Designs are encoded by voxel genotype (A), which is expressed as a spring-mass phenotype (B), and evaluated in a differentiable environment (C). The springs (teal lines in B and C) and masses (small orange spheres) are motorized and sensorized, respectively.
  • Figure 5: Recombination of substructures. A pair of designs (parents; A, B) is combined via crossover to produce a new design (offspring; C) that inherits components from both parents.
  • ...and 8 more figures