Extreme Q-Learning: MaxEnt RL without Entropy
Divyansh Garg, Joey Hejna, Matthieu Geist, Stefano Ermon
TL;DR
Extreme Q-Learning introduces a novel EVT-based framework to directly estimate the soft-optimal value function in MaxEnt RL without sampling from a policy. By modeling Gumbel-distributed errors in Bellman backups, it derives a Gumbel regression objective that yields LogSumExp values and a practical, entropy-free approach to MaxEnt RL applicable to online and offline settings. The method demonstrates strong offline performance on D4RL benchmarks (notably Franka Kitchen) and competitive online results on DM Control, while connecting soft-Q learning with conservative Q-learning through KL-based conservatism. Overall, XQL offers a simpler, principled alternative to policy-centric MaxEnt methods with robust performance gains across domains.
Abstract
Modern Deep Reinforcement Learning (RL) algorithms require estimates of the maximal Q-value, which are difficult to compute in continuous domains with an infinite number of possible actions. In this work, we introduce a new update rule for online and offline RL which directly models the maximal value using Extreme Value Theory (EVT), drawing inspiration from economics. By doing so, we avoid computing Q-values using out-of-distribution actions which is often a substantial source of error. Our key insight is to introduce an objective that directly estimates the optimal soft-value functions (LogSumExp) in the maximum entropy RL setting without needing to sample from a policy. Using EVT, we derive our \emph{Extreme Q-Learning} framework and consequently online and, for the first time, offline MaxEnt Q-learning algorithms, that do not explicitly require access to a policy or its entropy. Our method obtains consistently strong performance in the D4RL benchmark, outperforming prior works by \emph{10+ points} on the challenging Franka Kitchen tasks while offering moderate improvements over SAC and TD3 on online DM Control tasks. Visualizations and code can be found on our website at https://div99.github.io/XQL/.
