Contrastive Abstraction for Reinforcement Learning
Vihang Patil, Markus Hofmarcher, Elisabeth Rumetshofer, Sepp Hochreiter
TL;DR
This work tackles the challenge of learning efficient RL policies in long trajectories by proposing contrastive abstraction learning, a reward-free approach to discover abstract states. The method first learns state representations through self-supervised contrastive learning on sequentially proximal states and then uses modern Hopfield networks to map similar representations to fixed points, producing a tunable abstraction level controlled by the inverse temperature $\beta$. It further introduces a $\beta$-network to adapt the abstraction to a given state and explores three downstream deployment strategies: Abstract Policy, Meta Policy, and Planning over a graph of abstract states, demonstrating notable gains in sample efficiency and planning capability across diverse environments. The approach offers a versatile, reward-free precursor that can be amortized across tasks and provides a principled mechanism to adjust granularity of abstraction for different downstream objectives. Overall, contrastive abstraction learning advances representation learning for RL by combining self-supervised clustering with associative memory dynamics to yield scalable, task-agnostic abstractions.
Abstract
Learning agents with reinforcement learning is difficult when dealing with long trajectories that involve a large number of states. To address these learning problems effectively, the number of states can be reduced by abstract representations that cluster states. In principle, deep reinforcement learning can find abstract states, but end-to-end learning is unstable. We propose contrastive abstraction learning to find abstract states, where we assume that successive states in a trajectory belong to the same abstract state. Such abstract states may be basic locations, achieved subgoals, inventory, or health conditions. Contrastive abstraction learning first constructs clusters of state representations by contrastive learning and then applies modern Hopfield networks to determine the abstract states. The first phase of contrastive abstraction learning is self-supervised learning, where contrastive learning forces states with sequential proximity to have similar representations. The second phase uses modern Hopfield networks to map similar state representations to the same fixed point, i.e.\ to an abstract state. The level of abstraction can be adjusted by determining the number of fixed points of the modern Hopfield network. Furthermore, \textit{contrastive abstraction learning} does not require rewards and facilitates efficient reinforcement learning for a wide range of downstream tasks. Our experiments demonstrate the effectiveness of contrastive abstraction learning for reinforcement learning.
