Table of Contents
Fetching ...

Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning

Jiange Yang, Haoyi Zhu, Yating Wang, Gangshan Wu, Tong He, Limin Wang

TL;DR

The paper addresses the challenge of generalizing robot trajectory prediction by leveraging large-scale out-of-domain, action-free video data alongside small-scale in-domain demonstrations. It introduces Tra-MoE, a sparsely gated Mixture-of-Experts trajectory model with Top-1 gating to scale capacity while preserving constant FLOPs, and an adaptive policy conditioning mechanism that maps 2D trajectories to image observations via a learnable mask. Training proceeds with joint pre-training of the trajectory model on multi-domain data, followed by training a trajectory-guided policy with the trajectory model frozen, using losses that promote expert specialization and training stability. Across simulation and real-world experiments, Tra-MoE consistently outperforms dense baselines with matched parameters, and the adaptive conditioning further enhances policy performance by aligning trajectory cues with visual input.

Abstract

Learning from multiple domains is a primary factor that influences the generalization of a single unified robot system. In this paper, we aim to learn the trajectory prediction model by using broad out-of-domain data to improve its performance and generalization ability. Trajectory model is designed to predict any-point trajectories in the current frame given an instruction and can provide detailed control guidance for robotic policy learning. To handle the diverse out-of-domain data distribution, we propose a sparsely-gated MoE (\textbf{Top-1} gating strategy) architecture for trajectory model, coined as \textbf{Tra-MoE}. The sparse activation design enables good balance between parameter cooperation and specialization, effectively benefiting from large-scale out-of-domain data while maintaining constant FLOPs per token. In addition, we further introduce an adaptive policy conditioning technique by learning 2D mask representations for predicted trajectories, which is explicitly aligned with image observations to guide action prediction more flexibly. We perform extensive experiments on both simulation and real-world scenarios to verify the effectiveness of Tra-MoE and adaptive policy conditioning technique. We also conduct a comprehensive empirical study to train Tra-MoE, demonstrating that our Tra-MoE consistently exhibits superior performance compared to the dense baseline model, even when the latter is scaled to match Tra-MoE's parameter count.

Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning

TL;DR

The paper addresses the challenge of generalizing robot trajectory prediction by leveraging large-scale out-of-domain, action-free video data alongside small-scale in-domain demonstrations. It introduces Tra-MoE, a sparsely gated Mixture-of-Experts trajectory model with Top-1 gating to scale capacity while preserving constant FLOPs, and an adaptive policy conditioning mechanism that maps 2D trajectories to image observations via a learnable mask. Training proceeds with joint pre-training of the trajectory model on multi-domain data, followed by training a trajectory-guided policy with the trajectory model frozen, using losses that promote expert specialization and training stability. Across simulation and real-world experiments, Tra-MoE consistently outperforms dense baselines with matched parameters, and the adaptive conditioning further enhances policy performance by aligning trajectory cues with visual input.

Abstract

Learning from multiple domains is a primary factor that influences the generalization of a single unified robot system. In this paper, we aim to learn the trajectory prediction model by using broad out-of-domain data to improve its performance and generalization ability. Trajectory model is designed to predict any-point trajectories in the current frame given an instruction and can provide detailed control guidance for robotic policy learning. To handle the diverse out-of-domain data distribution, we propose a sparsely-gated MoE (\textbf{Top-1} gating strategy) architecture for trajectory model, coined as \textbf{Tra-MoE}. The sparse activation design enables good balance between parameter cooperation and specialization, effectively benefiting from large-scale out-of-domain data while maintaining constant FLOPs per token. In addition, we further introduce an adaptive policy conditioning technique by learning 2D mask representations for predicted trajectories, which is explicitly aligned with image observations to guide action prediction more flexibly. We perform extensive experiments on both simulation and real-world scenarios to verify the effectiveness of Tra-MoE and adaptive policy conditioning technique. We also conduct a comprehensive empirical study to train Tra-MoE, demonstrating that our Tra-MoE consistently exhibits superior performance compared to the dense baseline model, even when the latter is scaled to match Tra-MoE's parameter count.

Paper Structure

This paper contains 16 sections, 5 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: (a) Training with small-scale, in-domain data. (b) Joint training with in-domain data and large-scale, out-of-domain data.
  • Figure 2: Training trajectory prediction model from multiple domains. (a) The visualization of Dataset $\boldsymbol{\mathcal{D}_{ood}}$ and $\boldsymbol{\mathcal{D}_{in}}$. The former may contain additional environments, objects, skills and embodiments. (b) Our pipeline: first co-training trajectory prediction model and then adapting it for downstream policy learning.
  • Figure 3: (a) The pipeline of our sparsely-gated MoE-based trajectory model (Tra-MoE). (b) The pipeline of our trajectory-guided policy using the adaptive policy conditioning technique. Mapping means concatenating the trajectory mask with image observations, while learnable refers to setting each point in the trajectory mask as a learnable embedding.
  • Figure 4: Left: The real-world experiments hardware platform setup. Right: The real-world tasks evaluation demonstrations.
  • Figure 5: The quantitative relationship between downstream policy success rate and trajectory model performance.
  • ...and 1 more figures