Contrastive Abstraction for Reinforcement Learning

Vihang Patil; Markus Hofmarcher; Elisabeth Rumetshofer; Sepp Hochreiter

Contrastive Abstraction for Reinforcement Learning

Vihang Patil, Markus Hofmarcher, Elisabeth Rumetshofer, Sepp Hochreiter

TL;DR

This work tackles the challenge of learning efficient RL policies in long trajectories by proposing contrastive abstraction learning, a reward-free approach to discover abstract states. The method first learns state representations through self-supervised contrastive learning on sequentially proximal states and then uses modern Hopfield networks to map similar representations to fixed points, producing a tunable abstraction level controlled by the inverse temperature $\beta$. It further introduces a $\beta$-network to adapt the abstraction to a given state and explores three downstream deployment strategies: Abstract Policy, Meta Policy, and Planning over a graph of abstract states, demonstrating notable gains in sample efficiency and planning capability across diverse environments. The approach offers a versatile, reward-free precursor that can be amortized across tasks and provides a principled mechanism to adjust granularity of abstraction for different downstream objectives. Overall, contrastive abstraction learning advances representation learning for RL by combining self-supervised clustering with associative memory dynamics to yield scalable, task-agnostic abstractions.

Abstract

Learning agents with reinforcement learning is difficult when dealing with long trajectories that involve a large number of states. To address these learning problems effectively, the number of states can be reduced by abstract representations that cluster states. In principle, deep reinforcement learning can find abstract states, but end-to-end learning is unstable. We propose contrastive abstraction learning to find abstract states, where we assume that successive states in a trajectory belong to the same abstract state. Such abstract states may be basic locations, achieved subgoals, inventory, or health conditions. Contrastive abstraction learning first constructs clusters of state representations by contrastive learning and then applies modern Hopfield networks to determine the abstract states. The first phase of contrastive abstraction learning is self-supervised learning, where contrastive learning forces states with sequential proximity to have similar representations. The second phase uses modern Hopfield networks to map similar state representations to the same fixed point, i.e.\ to an abstract state. The level of abstraction can be adjusted by determining the number of fixed points of the modern Hopfield network. Furthermore, \textit{contrastive abstraction learning} does not require rewards and facilitates efficient reinforcement learning for a wide range of downstream tasks. Our experiments demonstrate the effectiveness of contrastive abstraction learning for reinforcement learning.

Contrastive Abstraction for Reinforcement Learning

TL;DR

. It further introduces a

-network to adapt the abstraction to a given state and explores three downstream deployment strategies: Abstract Policy, Meta Policy, and Planning over a graph of abstract states, demonstrating notable gains in sample efficiency and planning capability across diverse environments. The approach offers a versatile, reward-free precursor that can be amortized across tasks and provides a principled mechanism to adjust granularity of abstraction for different downstream objectives. Overall, contrastive abstraction learning advances representation learning for RL by combining self-supervised clustering with associative memory dynamics to yield scalable, task-agnostic abstractions.

Abstract

Paper Structure (46 sections, 2 theorems, 7 equations, 6 figures, 1 table)

This paper contains 46 sections, 2 theorems, 7 equations, 6 figures, 1 table.

Introduction
Intuition for our approach.
Method
Contrastive Learning of Sequential Proximal States
Sampling a Positive Pair
Abstractions via Modern Hopfield Networks
Controlling the Level of Abstraction
Training a $\beta$-network.
Using Abstraction for Downstream Tasks
Learning a policy from abstract states (Abstract Policy).
Learning sub-policies as meta-actions (Meta Policy).
Goal-conditioned planning from a graph over abstract states (Planning).
Experiments
Environments
Contrastive and Abstract Representation
...and 31 more sections

Key Result

Theorem A1

With query $\bm{\xi}$, after one update the distance of the new point $f(\bm{\xi})$ to the fixed point $\bm{u}_i^*$ is exponentially small in the separation $\Delta_i$. The precise bounds using the Jacobian $\mathrm{J} = \frac{\partial f(\bm{\xi})}{\partial \bm{\xi}}$ and its value $\mathrm{J}^m$ in For given $\epsilon$ and sufficient large $\Delta_i$, we have ${{\left\| f(\bm{\xi}) \ - \ \bm{u}_i

Figures (6)

Figure 1: Contrastive abstraction learning.(a) The first phase applies self-supervised learning via contrastive learning to represent states with sequential proximity in a similar way. Using the InfoNCE objective, two sequentially close states are forced to have a similar representation and non-close states dis-similar representation. (b) In the second phase, a modern Hopfield network maps similar representations to the same fixed point which constitutes an abstract state. (c) In the last phase, downstream tasks are solved in the reduced state space.
Figure 2: Visualization of learned representation of the states for different sampling techniques for contrastive learning. Left: Two trajectories (${ \color calRed{\star}$ and ${ \color calGreen{\blacktriangle}$ ) in the Maze2D environment. Right: Different sampling techniques of positive pairs for contrastive learning. Red points are states in the dataset, while blue and green are the trajectories from left. Sampling from the Laplace distribution leads to representation well suited for abstraction, while Gaussian does not yield clear clusters.
Figure 3: Learned representations (blue) and fixed points (red) of a MHN for Maze2D and RedBlueDoor. The parameter $\beta$ of the MHN increases from left to right, thus increasing the number of fixed points. The learned representation forms clusters of states where corridors (top) or rooms (bottom) are close. These clusters are mapped to abstract states via an MHN. In the rooms example (bottom row), (a) corresponds to states where the agent is in Room 1 and the red door is closed, (b) corresponds to the red door being open, (c) corresponds to agent being in Room 2.
Figure 4: Visualization of all states (blue) in the memory, with the resulting fixed points (red) in CifarEnv. We compare the number of fixed points when the temperature parameter is fixed (a) and when the temperature parameter is learned (b). The learned temperature parameter reduces the state space to a number of fixed points equal to the number of Cifar classes in the environment.
Figure 5: We show training of policies from abstract states to solve four tasks in CifarEnv. Abstract Policy uses abstract states but the original action space. Meta Policy selects actions from an abstract action space and executes sub-policies. Contrastive Representation uses state representations from contrastive pre-training. For Original State both action and state space are not changed. Both methods using abstract states can solve all tasks.
...and 1 more figures

Theorems & Definitions (3)

Theorem A1: Modern Hopfield Networks: Retrieval with One Update Ramsauer:21
Definition A1: Pattern Stored and Retrieved Ramsauer:21
Theorem A2: Modern Hopfield Networks: Exponential Storage Capacity Ramsauer:21

Contrastive Abstraction for Reinforcement Learning

TL;DR

Abstract

Contrastive Abstraction for Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (3)