Table of Contents
Fetching ...

Towards Multi-Morphology Controllers with Diversity and Knowledge Distillation

Alican Mertan, Nick Cheney

TL;DR

This work tackles the problem of learning controllers that generalize across many morphologies. It pairs Quality Diversity domain exploration (MAP-Elites) to build a repertoire of high-performing single-morphology controllers with a knowledge-distillation step that trains a single multi-morphology controller to mimic their input-output patterns. The distilled controller demonstrates near-teacher performance across diverse morphologies, zero-shot generalization to unseen bodies, and serves as an efficient prior for rapid finetuning on new morphologies or tasks. The approach is architecture-agnostic, scalable with the number of teachers, and synergistic with ongoing advances in brain-body co-optimization and modular controller design, offering a practical pathway toward foundational multi-morphology controllers.

Abstract

Finding controllers that perform well across multiple morphologies is an important milestone for large-scale robotics, in line with recent advances via foundation models in other areas of machine learning. However, the challenges of learning a single controller to control multiple morphologies make the `one robot one task' paradigm dominant in the field. To alleviate these challenges, we present a pipeline that: (1) leverages Quality Diversity algorithms like MAP-Elites to create a dataset of many single-task/single-morphology teacher controllers, then (2) distills those diverse controllers into a single multi-morphology controller that performs well across many different body plans by mimicking the sensory-action patterns of the teacher controllers via supervised learning. The distilled controller scales well with the number of teachers/morphologies and shows emergent properties. It generalizes to unseen morphologies in a zero-shot manner, providing robustness to morphological perturbations and instant damage recovery. Lastly, the distilled controller is also independent of the teacher controllers -- we can distill the teacher's knowledge into any controller model, making our approach synergistic with architectural improvements and existing training algorithms for teacher controllers.

Towards Multi-Morphology Controllers with Diversity and Knowledge Distillation

TL;DR

This work tackles the problem of learning controllers that generalize across many morphologies. It pairs Quality Diversity domain exploration (MAP-Elites) to build a repertoire of high-performing single-morphology controllers with a knowledge-distillation step that trains a single multi-morphology controller to mimic their input-output patterns. The distilled controller demonstrates near-teacher performance across diverse morphologies, zero-shot generalization to unseen bodies, and serves as an efficient prior for rapid finetuning on new morphologies or tasks. The approach is architecture-agnostic, scalable with the number of teachers, and synergistic with ongoing advances in brain-body co-optimization and modular controller design, offering a practical pathway toward foundational multi-morphology controllers.

Abstract

Finding controllers that perform well across multiple morphologies is an important milestone for large-scale robotics, in line with recent advances via foundation models in other areas of machine learning. However, the challenges of learning a single controller to control multiple morphologies make the `one robot one task' paradigm dominant in the field. To alleviate these challenges, we present a pipeline that: (1) leverages Quality Diversity algorithms like MAP-Elites to create a dataset of many single-task/single-morphology teacher controllers, then (2) distills those diverse controllers into a single multi-morphology controller that performs well across many different body plans by mimicking the sensory-action patterns of the teacher controllers via supervised learning. The distilled controller scales well with the number of teachers/morphologies and shows emergent properties. It generalizes to unseen morphologies in a zero-shot manner, providing robustness to morphological perturbations and instant damage recovery. Lastly, the distilled controller is also independent of the teacher controllers -- we can distill the teacher's knowledge into any controller model, making our approach synergistic with architectural improvements and existing training algorithms for teacher controllers.
Paper Structure (11 sections, 11 figures)

This paper contains 11 sections, 11 figures.

Figures (11)

  • Figure 1: Experimented morphologies to show the effectiveness of knowledge distillation for training multi-morphology controller.
  • Figure 2: Training trajectories of isolated training on each morphology individually (solid lines) vs. the performance of each morphology during joint training on all (dashed lines). Lines show the mean values and the shaded areas show the standard errors, calculated over 3 repetitions. Reinforcement Learning (left) can find solutions that work well for multiple morphologies but ignore others. Evolutionary algorithms (right) find solutions that perform similarly on all morphologies, but they exhibit sub-optimal performance.
  • Figure 3: Performance of the multi-morphology controller relative to teacher single-morphology controllers for each experimented morphology ($mean \pm SE$). Here, and in all figures, the dotted line marks equal performance. Knowledge distillation can successfully train a single controller to control multiple morphologies as well as controllers specifically optimized for individual morphologies.
  • Figure 4: An example map produced by the MAP-Elites algorithm. Each cell corresponds to a robot-controller pair with its fitness shown as the color of the cell. The X-axis differentiates bins in the map by the number of total voxels present in the robot, while the y-axis stratifies robots by their number of active voxels. MAP-Elites successfully evolves a variety of high-performing robots.
  • Figure 5: (left) Top 10% of individuals (29 in total) used as teachers for distilling a multi-morphology controller, marked with an x. (right) Performance of the distilled multi-morphology controller on each trained morphology compared to their original teacher controllers across 10 runs with noise. Each data point is plotted and its color represents the fitness of the original controller. The mean point is labeled with an x. The distilled controller achieves almost perfect performance, matching the performances of teacher single-morphology controllers.
  • ...and 6 more figures