OGBench: Benchmarking Offline Goal-Conditioned RL

Seohong Park; Kevin Frans; Benjamin Eysenbach; Sergey Levine

OGBench: Benchmarking Offline Goal-Conditioned RL

Seohong Park, Kevin Frans, Benjamin Eysenbach, Sergey Levine

TL;DR

OGBench introduces a comprehensive, multi-domain benchmark for offline goal-conditioned RL, featuring 8 environment types, 85 datasets, and 6 reference algorithms to rigorously evaluate capabilities like goal stitching, long-horizon planning, and stochastic control. It argues that prior benchmarks inadequately capture the diverse challenges of offline GCRL and demonstrates through results that no single method dominates across tasks, especially under multi-goal evaluation. The paper details the task design principles, environment cohorts, and dataset-generation controls, and discusses practical research opportunities and the potential of offline GCRL as a foundation for general-purpose RL pretraining. By providing reproducible, tunable datasets and reference implementations, OGBench aims to accelerate algorithmic progress and clearer research signals in offline GCRL.

Abstract

Offline goal-conditioned reinforcement learning (GCRL) is a major problem in reinforcement learning (RL) because it provides a simple, unsupervised, and domain-agnostic way to acquire diverse behaviors and representations from unlabeled data without rewards. Despite the importance of this setting, we lack a standard benchmark that can systematically evaluate the capabilities of offline GCRL algorithms. In this work, we propose OGBench, a new, high-quality benchmark for algorithms research in offline goal-conditioned RL. OGBench consists of 8 types of environments, 85 datasets, and reference implementations of 6 representative offline GCRL algorithms. We have designed these challenging and realistic environments and datasets to directly probe different capabilities of algorithms, such as stitching, long-horizon reasoning, and the ability to handle high-dimensional inputs and stochasticity. While representative algorithms may rank similarly on prior benchmarks, our experiments reveal stark strengths and weaknesses in these different capabilities, providing a strong foundation for building new algorithms. Project page: https://seohong.me/projects/ogbench

OGBench: Benchmarking Offline Goal-Conditioned RL

TL;DR

Abstract

OGBench: Benchmarking Offline Goal-Conditioned RL

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (118)