Table of Contents
Fetching ...

Knowledge Diversion for Efficient Morphology Control and Policy Transfer

Fu Feng, Ruixiao Shi, Yucheng Xie, Jianlu Shen, Jing Wang, Xin Geng

TL;DR

This work tackles the difficulty of transferring policies across diverse agent morphologies and tasks by introducing DivMorph, a modular learning framework that decomposes Transformer weights using SVD into shared learngenes and morphology-/task-specific tailors. It advances policy reuse and efficiency through dynamic soft gating based on morphology and task embeddings, enabling zero-shot generalization and substantial deployment savings. Empirical results on the UNIMAL benchmark show DivMorph delivers state-of-the-art cross-task transfer with roughly 3× faster sample efficiency and up to 17× model-size reduction for deployment, while maintaining or surpassing performance on training morphologies. Overall, DivMorph demonstrates that explicit knowledge disentanglement and modular routing can yield scalable, adaptable universal morphology control suitable for real-world robotic systems.

Abstract

Universal morphology control aims to learn a universal policy that generalizes across heterogeneous agent morphologies, with Transformer-based controllers emerging as a popular choice. However, such architectures incur substantial computational costs, resulting in high deployment overhead, and existing methods exhibit limited cross-task generalization, necessitating training from scratch for each new task. To this end, we propose \textbf{DivMorph}, a modular training paradigm that leverages knowledge diversion to learn decomposable controllers. DivMorph factorizes randomly initialized Transformer weights into factor units via SVD prior to training and employs dynamic soft gating to modulate these units based on task and morphology embeddings, separating them into shared \textit{learngenes} and morphology- and task-specific \textit{tailors}, thereby achieving knowledge disentanglement. By selectively activating relevant components, DivMorph enables scalable and efficient policy deployment while supporting effective policy transfer to novel tasks. Extensive experiments demonstrate that DivMorph achieves state-of-the-art performance, achieving a 3$\times$ improvement in sample efficiency over direct finetuning for cross-task transfer and a 17$\times$ reduction in model size for single-agent deployment.

Knowledge Diversion for Efficient Morphology Control and Policy Transfer

TL;DR

This work tackles the difficulty of transferring policies across diverse agent morphologies and tasks by introducing DivMorph, a modular learning framework that decomposes Transformer weights using SVD into shared learngenes and morphology-/task-specific tailors. It advances policy reuse and efficiency through dynamic soft gating based on morphology and task embeddings, enabling zero-shot generalization and substantial deployment savings. Empirical results on the UNIMAL benchmark show DivMorph delivers state-of-the-art cross-task transfer with roughly 3× faster sample efficiency and up to 17× model-size reduction for deployment, while maintaining or surpassing performance on training morphologies. Overall, DivMorph demonstrates that explicit knowledge disentanglement and modular routing can yield scalable, adaptable universal morphology control suitable for real-world robotic systems.

Abstract

Universal morphology control aims to learn a universal policy that generalizes across heterogeneous agent morphologies, with Transformer-based controllers emerging as a popular choice. However, such architectures incur substantial computational costs, resulting in high deployment overhead, and existing methods exhibit limited cross-task generalization, necessitating training from scratch for each new task. To this end, we propose \textbf{DivMorph}, a modular training paradigm that leverages knowledge diversion to learn decomposable controllers. DivMorph factorizes randomly initialized Transformer weights into factor units via SVD prior to training and employs dynamic soft gating to modulate these units based on task and morphology embeddings, separating them into shared \textit{learngenes} and morphology- and task-specific \textit{tailors}, thereby achieving knowledge disentanglement. By selectively activating relevant components, DivMorph enables scalable and efficient policy deployment while supporting effective policy transfer to novel tasks. Extensive experiments demonstrate that DivMorph achieves state-of-the-art performance, achieving a 3 improvement in sample efficiency over direct finetuning for cross-task transfer and a 17 reduction in model size for single-agent deployment.

Paper Structure

This paper contains 23 sections, 8 equations, 8 figures.

Figures (8)

  • Figure 1: (a) Traditional universal morphology control trains a single controller for diverse morphologies but remains task-specific, requiring separate training for each new task. (b) DivMorph leverages knowledge diversion to train a modular network that decouples shared, morphology-specific, and task-specific components, enabling adaptive recomposition for universal morphology control with 3$\times$ higher sample efficiency on new tasks, and 17$\times$ smaller deployment models for a given agent morphology.
  • Figure 2: Overview of DivMorph. (a) Morphology-Aware Transformer. It encodes the robot’s modular structure as a token sequence and models inter-limb interactions through a unified sequence representation, providing a generalizable controller across diverse morphologies. (b) Knowledge Diversion. Each randomly initialized weight matrix is factorized via SVD into a set of factor units, categorized into shared learngenes, morphology-specific tailors, and task-specific tailors. A dynamic soft gating mechanism selects the relevant tailors for each input while jointly updating the shared learngenes, enabling modular and disentangled representations across morphologies and tasks.
  • Figure 3: Comparison of policy transfer performance on novel tasks with training morphologies. Training curves show the mean and standard deviation of rewards for 100 UNIMAL robots with training morphologies, averaged over 3 runs per task. DivMorph consistently outperforms baselines, demonstrating higher sample efficiency across all tasks.
  • Figure 4: Comparison of policy transfer performance on novel tasks with novel morphologies. Training curves show the mean and standard deviation of rewards for 100 UNIMAL robots with novel morphologies, averaged over 3 runs per task. DivMorph still maintains a clear advantage, further demonstrating its robust generalization and rapid adaptation to unseen morphologies and tasks.
  • Figure 5: Comparison of single-agent deployment performance on training and novel morphologies. Bars show the mean and standard deviation of rewards for 100 UNIMAL robots, averaged over 3 runs per task (a, b), together with the corresponding model parameter count (c). DivMorph attains substantial model compression while matching the teacher models (MetaMorph pre-trained policies) on training morphologies, which approximate an upper performance bound. On novel morphologies, DivMorph exhibits strong zero-shot generalization, at times even exceeding the teacher.
  • ...and 3 more figures