Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting

Edoardo Cetin; Ahmed Touati; Yann Ollivier

Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting

Edoardo Cetin, Ahmed Touati, Yann Ollivier

TL;DR

This work addresses zero-shot reinforcement learning with Forward-Backward representations by tackling two bottlenecks: linear task encoding and offline-data optimization. It introduces auto-regressive task features to enable nonlinear, hierarchical task encodings and integrates advanced offline-RL techniques (advantage weighting, evaluation-based sampling, and uncertainty modeling) to improve learning from reward-free offline datasets. Empirically, AR-FB with AW and AWARE delivers strong performance on Jaco Arm, DMC Locomotion, MOOD datasets, and D4RL benchmarks, matching or approaching task-specific offline methods in several settings. The results indicate that zero-shot behavioral foundation models can reach a substantial fraction of specialized offline-RL performance, with AR-FB offering moderate gains in spatial precision and out-of-dataset generalization under adequate offline data conditions.

Abstract

The forward-backward representation (FB) is a recently proposed framework (Touati et al., 2023; Touati & Ollivier, 2021) to train behavior foundation models (BFMs) that aim at providing zero-shot efficient policies for any new task specified in a given reinforcement learning (RL) environment, without training for each new task. Here we address two core limitations of FB model training. First, FB, like all successor-feature-based methods, relies on a linear encoding of tasks: at test time, each new reward function is linearly projected onto a fixed set of pre-trained features. This limits expressivity as well as precision of the task representation. We break the linearity limitation by introducing auto-regressive features for FB, which let finegrained task features depend on coarser-grained task information. This can represent arbitrary nonlinear task encodings, thus significantly increasing expressivity of the FB framework. Second, it is well-known that training RL agents from offline datasets often requires specific techniques.We show that FB works well together with such offline RL techniques, by adapting techniques from (Nair et al.,2020b; Cetin et al., 2024) for FB. This is necessary to get non-flatlining performance in some datasets, such as DMC Humanoid. As a result, we produce efficient FB BFMs for a number of new environments. Notably, in the D4RL locomotion benchmark, the generic FB agent matches the performance of standard single-task offline agents (IQL, XQL). In many setups, the offline techniques are needed to get any decent performance at all. The auto-regressive features have a positive but moderate impact, concentrated on tasks requiring spatial precision and task generalization beyond the behaviors represented in the trainset.

Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting

TL;DR

Abstract

Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (9)