GCNT: Graph-Based Transformer Policies for Morphology-Agnostic Reinforcement Learning

Yingbo Luo; Meibao Yao; Xueming Xiao

GCNT: Graph-Based Transformer Policies for Morphology-Agnostic Reinforcement Learning

Yingbo Luo, Meibao Yao, Xueming Xiao

TL;DR

GCNT tackles universal control for robots with varying morphologies by integrating a Graph Convolutional Network-based morphology extractor with a Transformer that enables direct, distance-aware communication among limbs. The architecture combines a Limb Observation Module, an improved GCN, a Weisfeiler-Lehman module for global morphology, and a learnable distance embedding to fuse information, optimized via TD3 or PPO across benchmarks. Empirical results show state-of-the-art performance and robust zero-shot generalization to unseen morphologies and kinematic/dynamic variations, outperforming baselines that rely on pure Transformers or traversal-based graph methods. This approach offers a scalable path to morphology-agnostic reinforcement learning with strong cross-morphology transfer and robustness in complex robotic tasks.

Abstract

Training a universal controller for robots with different morphologies is a promising research trend, since it can significantly enhance the robustness and resilience of the robotic system. However, diverse morphologies can yield different dimensions of state space and action space, making it difficult to comply with traditional policy networks. Existing methods address this issue by modularizing the robot configuration, while do not adequately extract and utilize the overall morphological information, which has been proven crucial for training a universal controller. To this end, we propose GCNT, a morphology-agnostic policy network based on improved Graph Convolutional Network (GCN) and Transformer. It exploits the fact that GCN and Transformer can handle arbitrary number of modules to achieve compatibility with diverse morphologies. Our key insight is that the GCN is able to efficiently extract morphology information of robots, while Transformer ensures that it is fully utilized by allowing each node of the robot to communicate this information directly. Experimental results show that our method can generate resilient locomotion behaviors for robots with different configurations, including zero-shot generalization to robot morphologies not seen during training. In particular, GCNT achieved the best performance on 8 tasks in the 2 standard benchmarks.

GCNT: Graph-Based Transformer Policies for Morphology-Agnostic Reinforcement Learning

TL;DR

Abstract

GCNT: Graph-Based Transformer Policies for Morphology-Agnostic Reinforcement Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)