Table of Contents
Fetching ...

Cross-Embodiment Offline Reinforcement Learning for Heterogeneous Robot Datasets

Haruki Abe, Takayuki Osa, Yusuke Mukuta, Tatsuya Harada

TL;DR

An embodiment-based grouping strategy is introduced in which robots are clustered by morphological similarity and the model is updated with a group gradient, which substantially reduces inter-robot conflicts and outperforms existing conflict-resolution methods.

Abstract

Scalable robot policy pre-training has been hindered by the high cost of collecting high-quality demonstrations for each platform. In this study, we address this issue by uniting offline reinforcement learning (offline RL) with cross-embodiment learning. Offline RL leverages both expert and abundant suboptimal data, and cross-embodiment learning aggregates heterogeneous robot trajectories across diverse morphologies to acquire universal control priors. We perform a systematic analysis of this offline RL and cross-embodiment paradigm, providing a principled understanding of its strengths and limitations. To evaluate this offline RL and cross-embodiment paradigm, we construct a suite of locomotion datasets spanning 16 distinct robot platforms. Our experiments confirm that this combined approach excels at pre-training with datasets rich in suboptimal trajectories, outperforming pure behavior cloning. However, as the proportion of suboptimal data and the number of robot types increase, we observe that conflicting gradients across morphologies begin to impede learning. To mitigate this, we introduce an embodiment-based grouping strategy in which robots are clustered by morphological similarity and the model is updated with a group gradient. This simple, static grouping substantially reduces inter-robot conflicts and outperforms existing conflict-resolution methods.

Cross-Embodiment Offline Reinforcement Learning for Heterogeneous Robot Datasets

TL;DR

An embodiment-based grouping strategy is introduced in which robots are clustered by morphological similarity and the model is updated with a group gradient, which substantially reduces inter-robot conflicts and outperforms existing conflict-resolution methods.

Abstract

Scalable robot policy pre-training has been hindered by the high cost of collecting high-quality demonstrations for each platform. In this study, we address this issue by uniting offline reinforcement learning (offline RL) with cross-embodiment learning. Offline RL leverages both expert and abundant suboptimal data, and cross-embodiment learning aggregates heterogeneous robot trajectories across diverse morphologies to acquire universal control priors. We perform a systematic analysis of this offline RL and cross-embodiment paradigm, providing a principled understanding of its strengths and limitations. To evaluate this offline RL and cross-embodiment paradigm, we construct a suite of locomotion datasets spanning 16 distinct robot platforms. Our experiments confirm that this combined approach excels at pre-training with datasets rich in suboptimal trajectories, outperforming pure behavior cloning. However, as the proportion of suboptimal data and the number of robot types increase, we observe that conflicting gradients across morphologies begin to impede learning. To mitigate this, we introduce an embodiment-based grouping strategy in which robots are clustered by morphological similarity and the model is updated with a group gradient. This simple, static grouping substantially reduces inter-robot conflicts and outperforms existing conflict-resolution methods.
Paper Structure (53 sections, 7 equations, 14 figures, 12 tables)

This paper contains 53 sections, 7 equations, 14 figures, 12 tables.

Figures (14)

  • Figure 1: Comparison of learning curves between cross-embodiment pre-trained networks and networks trained without cross-embodiment pre-training for Badger, Unitree G1, and Cassie.
  • Figure 2: Expert vs. 70% Suboptimal IQL performance across robots and avg. gradient cosine similarity $C$ on the 70% suboptimal dataset. Cells shaded blue ( ) indicate large positive transfer (CE exceeds Single by $>10$), while cells shaded light red ( ) indicate large negative transfer (CE falls below Single by $>10$).
  • Figure 3: (a) Embodiment-based similarity matrix (1 - min-max-normalized FGW distance between robot pairs); (b) Gradient cosine similarity matrix in Expert Forward dataset from Section \ref{['sec:gradient_conflicts']}, with the color scale compressed for readability using a symlog normalization; (c) Scatter plot of embodiment-based similarity and mean gradient cosine similarity for all robot pairs.
  • Figure 4: Overview of Embodiment Grouping (EG) for cross-embodiment offline RL.
  • Figure 5: Effect of the number of Embodiment Grouping clusters $M$ on final return and wall-clock training time (mean $\pm$ s.e.).
  • ...and 9 more figures