Table of Contents
Fetching ...

Frictional Q-Learning

Hyunwoo Kim, Hyo Kyung Lee

TL;DR

Frictional Q-Learning introduces a geometric, friction-based constraint to mitigate extrapolation error in off-policy RL by treating replay-buffer support as a tangent-space on an action manifold. It leverages a contrastive variational autoencoder to generate tangent-aligned actions and samples normal perturbations via an affine-transformed orthonormal basis, enabling a finite, controlled set of updates with convergence guarantees. A local stability bound ties directional perturbations to an anisotropy ratio and tolerance, ensuring updates stay within the data-supported region. Empirically, FQL delivers robust, stable performance on MuJoCo continuous-control benchmarks and offers improvements in imitation/offline settings relative to BCQ and other baselines.

Abstract

Off-policy reinforcement learning suffers from extrapolation errors when a learned policy selects actions that are weakly supported in the replay buffer. In this study, we address this issue by drawing an analogy to static friction in classical mechanics. From this perspective, the replay buffer is represented as a smooth, low-dimensional action manifold, where the support directions correspond to the tangential component, while the normal component captures the dominant first-order extrapolation error. This decomposition reveals an intrinsic anisotropy in value sensitivity that naturally induces a stability condition analogous to a friction threshold. To mitigate deviations toward unsupported actions, we propose Frictional Q-Learning, an off-policy algorithm that encodes supported actions as tangent directions using a contrastive variational autoencoder. We further show that an orthonormal basis of the orthogonal complement corresponds to normal components under mild local isometry assumptions. Empirical results on standard continuous-control benchmarks demonstrate robust, stable performance compared with existing baselines.

Frictional Q-Learning

TL;DR

Frictional Q-Learning introduces a geometric, friction-based constraint to mitigate extrapolation error in off-policy RL by treating replay-buffer support as a tangent-space on an action manifold. It leverages a contrastive variational autoencoder to generate tangent-aligned actions and samples normal perturbations via an affine-transformed orthonormal basis, enabling a finite, controlled set of updates with convergence guarantees. A local stability bound ties directional perturbations to an anisotropy ratio and tolerance, ensuring updates stay within the data-supported region. Empirically, FQL delivers robust, stable performance on MuJoCo continuous-control benchmarks and offers improvements in imitation/offline settings relative to BCQ and other baselines.

Abstract

Off-policy reinforcement learning suffers from extrapolation errors when a learned policy selects actions that are weakly supported in the replay buffer. In this study, we address this issue by drawing an analogy to static friction in classical mechanics. From this perspective, the replay buffer is represented as a smooth, low-dimensional action manifold, where the support directions correspond to the tangential component, while the normal component captures the dominant first-order extrapolation error. This decomposition reveals an intrinsic anisotropy in value sensitivity that naturally induces a stability condition analogous to a friction threshold. To mitigate deviations toward unsupported actions, we propose Frictional Q-Learning, an off-policy algorithm that encodes supported actions as tangent directions using a contrastive variational autoencoder. We further show that an orthonormal basis of the orthogonal complement corresponds to normal components under mild local isometry assumptions. Empirical results on standard continuous-control benchmarks demonstrate robust, stable performance compared with existing baselines.

Paper Structure

This paper contains 24 sections, 2 theorems, 38 equations, 4 figures, 3 tables, 1 algorithm.

Key Result

Theorem 4.1

Let $v \in \mathbb{R}^d$ be a unit vector satisfying $v^\top a = 0$. Under the local isometry assumption on the encoder $E_{M_{\mathcal{B}}}$, the latent representation $c = E_{M_{\mathcal{B}}}(s,v)$ satisfies $c^\top u = 0$ to first order, where $u = E_{M_{\mathcal{B}}}(s,a)$.

Figures (4)

  • Figure 1: Consider a body of mass $m$ resting on a plane inclined at an angle $\theta$ relative to the horizontal. The gravitational force $mg$ acts vertically downward and can be decomposed into two components with respect to the plane: a tangential component and a normal component perpendicular to the surface newton1833philosophiae.
  • Figure 2: In a smooth action manifold $M_{\mathcal{B}}$, unit vectors of subspace $T_s M_{\mathcal{B}}$ and $N_s M_{\mathcal{B}}$ span space $\mathcal{P}$ with unit direction $u(\theta)$.
  • Figure 3: Average return (solid line) and half of the standard deviation (shaded area) across five independent experiments with different random seeds in continuous-control environments. For visual clarity, mean curves are smoothed with an exponential moving average.
  • Figure 4: Average return (solid line) and half of the standard deviation (shaded area) of imitation learning across five independent experiments with different random seeds in continuous-control environments. The black line indicates the average episodic return for trajectories in the replay buffer. For visual clarity, mean curves are smoothed with an exponential moving average.

Theorems & Definitions (5)

  • Theorem 4.1
  • Lemma 1.1
  • proof
  • Remark 1.2
  • proof