Table of Contents
Fetching ...

Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models

Ron Vainshtein, Zohar Rimon, Shie Mannor, Chen Tessler

TL;DR

This work introduces Task Tokens, a parameter-efficient method to adapt Goal-Conditioned Behavior Foundation Models (GC-BFMs) like MaskedMimic by learning a Task Encoder that generates task-specific tokens while keeping the base model frozen. By fusing Prior Tokens, Task Tokens, and State Tokens, the approach enables hybrid goal- and reward-driven control, achieving rapid convergence and high success across diverse humanoid tasks. Experimental results show improved task adaptation, robustness to out-of-distribution perturbations, and enhanced perceived motion realism via human studies, with complementary gains when combined with other prompting modalities. Overall, Task Tokens offer a practical path to specialize BFMs for complex control tasks without sacrificing the generalization and naturalness of the underlying motion prior.

Abstract

Recent advancements in imitation learning have led to transformer-based behavior foundation models (BFMs) that enable multi-modal, human-like control for humanoid agents. While excelling at zero-shot generation of robust behaviors, BFMs often require meticulous prompt engineering for specific tasks, potentially yielding suboptimal results. We introduce "Task Tokens", a method to effectively tailor BFMs to specific tasks while preserving their flexibility. Our approach leverages the transformer architecture of BFMs to learn a new task-specific encoder through reinforcement learning, keeping the original BFM frozen. This allows incorporation of user-defined priors, balancing reward design and prompt engineering. By training a task encoder to map observations to tokens, used as additional BFM inputs, we guide performance improvement while maintaining the model's diverse control characteristics. We demonstrate Task Tokens' efficacy across various tasks, including out-of-distribution scenarios, and show their compatibility with other prompting modalities. Our results suggest that Task Tokens offer a promising approach for adapting BFMs to specific control tasks while retaining their generalization capabilities.

Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models

TL;DR

This work introduces Task Tokens, a parameter-efficient method to adapt Goal-Conditioned Behavior Foundation Models (GC-BFMs) like MaskedMimic by learning a Task Encoder that generates task-specific tokens while keeping the base model frozen. By fusing Prior Tokens, Task Tokens, and State Tokens, the approach enables hybrid goal- and reward-driven control, achieving rapid convergence and high success across diverse humanoid tasks. Experimental results show improved task adaptation, robustness to out-of-distribution perturbations, and enhanced perceived motion realism via human studies, with complementary gains when combined with other prompting modalities. Overall, Task Tokens offer a practical path to specialize BFMs for complex control tasks without sacrificing the generalization and naturalness of the underlying motion prior.

Abstract

Recent advancements in imitation learning have led to transformer-based behavior foundation models (BFMs) that enable multi-modal, human-like control for humanoid agents. While excelling at zero-shot generation of robust behaviors, BFMs often require meticulous prompt engineering for specific tasks, potentially yielding suboptimal results. We introduce "Task Tokens", a method to effectively tailor BFMs to specific tasks while preserving their flexibility. Our approach leverages the transformer architecture of BFMs to learn a new task-specific encoder through reinforcement learning, keeping the original BFM frozen. This allows incorporation of user-defined priors, balancing reward design and prompt engineering. By training a task encoder to map observations to tokens, used as additional BFM inputs, we guide performance improvement while maintaining the model's diverse control characteristics. We demonstrate Task Tokens' efficacy across various tasks, including out-of-distribution scenarios, and show their compatibility with other prompting modalities. Our results suggest that Task Tokens offer a promising approach for adapting BFMs to specific control tasks while retaining their generalization capabilities.

Paper Structure

This paper contains 24 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Task Tokens: Our approach combines three input sources: (1) Prior Tokens: optional tokens enabling user-defined behavioral priors from text prompts or joint conditions, (2) Task Token: generated by our learned Task Encoder that processes the current goal observation $g_t^i$, and (3) State Token: representing the current environment state $s_t^i$. The prior and state tokens are generated using the pre-trained encoders from the GC-BFM model. The frozen GC-BFM integrates these inputs to produce natural, task-optimized actions $a_t^i$. During training, the policy gradient objective is computed with respect to the BFM's actions, with gradients flowing through the frozen GC-BFM and back to the Task Encoder, enabling task-specific optimization without modifying the foundation model's parameters.
  • Figure 2: Multi-task adaptation. Task Tokens is an effective approach to adapt BFMs to new downstream tasks, while preserving its prior knowledge. Task Tokens can be used alongside other prompting modalities to generate personalized and robust motions to solve new tasks.
  • Figure 3: Convergence curves for Strike. Task Tokens is sample efficient, adapting to new tasks in under 50M steps.
  • Figure 4: Out-of-distribution perturbations. We test the success rate on the steering task when changing the ground friction (on the left) and gravity (on the right). Task tokens (both with and without J.C.) exhibit improved robustness.
  • Figure 5: Multi-modal prompting. When trained on the direction task, the policy often learns to walk backwards. Task Tokens enables adding human-defined-priors through additional tokens. By combining orientation priors, the BFM is instructed to face the movement direction.
  • ...and 2 more figures