C-Learning: Learning to Achieve Goals via Recursive Classification

Benjamin Eysenbach; Ruslan Salakhutdinov; Sergey Levine

C-Learning: Learning to Achieve Goals via Recursive Classification

Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine

TL;DR

The paper reframes goal-conditioned reinforcement learning as predicting and controlling the future state distribution via a binary classifier, rather than relying on rewards.It introduces C-learning, an off-policy bootstrapping approach that converts classifier outputs into a density over future states and optimizes policies toward commanded goals.The authors prove convergence properties of the off-policy C-learning updates and compare with Q-learning and hindsight relabeling, showing more accurate density estimates and competitive task performance.Experiments across gridworld and continuous-control tasks, including Sawyer manipulation, demonstrate robustness, reduced hyperparameter sensitivity (no need for a goal-sampling ratio), and practical scalability.

Abstract

We study the problem of predicting and controlling the future state distribution of an autonomous agent. This problem, which can be viewed as a reframing of goal-conditioned reinforcement learning (RL), is centered around learning a conditional probability density function over future states. Instead of directly estimating this density function, we indirectly estimate this density function by training a classifier to predict whether an observation comes from the future. Via Bayes' rule, predictions from our classifier can be transformed into predictions over future states. Importantly, an off-policy variant of our algorithm allows us to predict the future state distribution of a new policy, without collecting new experience. This variant allows us to optimize functionals of a policy's future state distribution, such as the density of reaching a particular goal state. While conceptually similar to Q-learning, our work lays a principled foundation for goal-conditioned RL as density estimation, providing justification for goal-conditioned methods used in prior work. This foundation makes hypotheses about Q-learning, including the optimal goal-sampling ratio, which we confirm experimentally. Moreover, our proposed method is competitive with prior goal-conditioned RL methods.

C-Learning: Learning to Achieve Goals via Recursive Classification

TL;DR

Abstract

C-Learning: Learning to Achieve Goals via Recursive Classification

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (15)