Table of Contents
Fetching ...

Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control

Zheng Xiong, Risto Vuorio, Jacob Beck, Matthieu Zimmer, Kun Shao, Shimon Whiteson

TL;DR

HyperDistill addresses the challenge of achieving Transformer-level performance with inference-efficient policies across heterogeneous robot morphologies. It uses a morphology-conditioned hypernetwork to generate per-robot MLP policies and trains this student through policy distillation from a universal TF teacher. The method demonstrates on UNIMAL that it matches teacher performance on training and unseen morphologies while reducing model size by 6-14x and FLOPs by 67-160x, supporting a knowledge decoupling hypothesis. This decoupling could generalize to improve efficiency in other domains and offers a practical path toward scalable universal morphology control.

Abstract

Learning a universal policy across different robot morphologies can significantly improve learning efficiency and enable zero-shot generalization to unseen morphologies. However, learning a highly performant universal policy requires sophisticated architectures like transformers (TF) that have larger memory and computational cost than simpler multi-layer perceptrons (MLP). To achieve both good performance like TF and high efficiency like MLP at inference time, we propose HyperDistill, which consists of: (1) A morphology-conditioned hypernetwork (HN) that generates robot-wise MLP policies, and (2) A policy distillation approach that is essential for successful training. We show that on UNIMAL, a benchmark with hundreds of diverse morphologies, HyperDistill performs as well as a universal TF teacher policy on both training and unseen test robots, but reduces model size by 6-14 times, and computational cost by 67-160 times in different environments. Our analysis attributes the efficiency advantage of HyperDistill at inference time to knowledge decoupling, i.e., the ability to decouple inter-task and intra-task knowledge, a general principle that could also be applied to improve inference efficiency in other domains.

Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control

TL;DR

HyperDistill addresses the challenge of achieving Transformer-level performance with inference-efficient policies across heterogeneous robot morphologies. It uses a morphology-conditioned hypernetwork to generate per-robot MLP policies and trains this student through policy distillation from a universal TF teacher. The method demonstrates on UNIMAL that it matches teacher performance on training and unseen morphologies while reducing model size by 6-14x and FLOPs by 67-160x, supporting a knowledge decoupling hypothesis. This decoupling could generalize to improve efficiency in other domains and offers a practical path toward scalable universal morphology control.

Abstract

Learning a universal policy across different robot morphologies can significantly improve learning efficiency and enable zero-shot generalization to unseen morphologies. However, learning a highly performant universal policy requires sophisticated architectures like transformers (TF) that have larger memory and computational cost than simpler multi-layer perceptrons (MLP). To achieve both good performance like TF and high efficiency like MLP at inference time, we propose HyperDistill, which consists of: (1) A morphology-conditioned hypernetwork (HN) that generates robot-wise MLP policies, and (2) A policy distillation approach that is essential for successful training. We show that on UNIMAL, a benchmark with hundreds of diverse morphologies, HyperDistill performs as well as a universal TF teacher policy on both training and unseen test robots, but reduces model size by 6-14 times, and computational cost by 67-160 times in different environments. Our analysis attributes the efficiency advantage of HyperDistill at inference time to knowledge decoupling, i.e., the ability to decouple inter-task and intra-task knowledge, a general principle that could also be applied to improve inference efficiency in other domains.
Paper Structure (44 sections, 1 equation, 8 figures, 2 tables)

This paper contains 44 sections, 1 equation, 8 figures, 2 tables.

Figures (8)

  • Figure 1: The architecture of HyperDistill. Different colors highlight the correspondence between the parameters in the base network and the context embedding they condition on via HN. We only show one hidden layer in the base MLP for ease of illustration. More hidden layers can be easily added in a similar way.
  • Figure 2: The performance of different methods on the training robots in each environment.
  • Figure 3: The performance of different methods on the test robots in each environment.
  • Figure 4: The student's learning curves under different teacher choices. In the figure legend, "X $\rightarrow$ Y" means that we distill from teacher X into student Y.
  • Figure 5: Final generalization performance of HyperDistill and TF (oracle) w.r.t. the number of PD robots in different environments.
  • ...and 3 more figures