Table of Contents
Fetching ...

Stable Deep Reinforcement Learning via Isotropic Gaussian Representations

Ali Saheb, Johan Obando-Ceron, Aaron Courville, Pouya Bashivan, Pablo Samuel Castro

TL;DR

This work analyzes how non-stationarity in deep reinforcement learning destabilizes training and degrades representations. It advocates isotropic Gaussian representations, enforced via the Sketched Isotropic Gaussian Regularization (SIGReg), as a principled prior that yields stable tracking of drifting targets and maximizes entropy under a fixed variance budget. Theoretical analysis shows that isotropy provides uniform contraction across directions while Gaussian tails minimize drift variance, and empirical results across CIFAR-10 shifts, Atari PQN/PPO, and Isaac Gym demonstrate improved stability, reduced neuron dormancy, and higher performance. Overall, shaping representation geometry emerges as a robust, lightweight pathway to stabilizing learning in non-stationary, online RL settings with broad applicability across algorithms and domains.

Abstract

Deep reinforcement learning systems often suffer from unstable training dynamics due to non-stationarity, where learning objectives and data distributions evolve over time. We show that under non-stationary targets, isotropic Gaussian embeddings are provably advantageous. In particular, they induce stable tracking of time-varying targets for linear readouts, achieve maximal entropy under a fixed variance budget, and encourage a balanced use of all representational dimensions--all of which enable agents to be more adaptive and stable. Building on this insight, we propose the use of Sketched Isotropic Gaussian Regularization for shaping representations toward an isotropic Gaussian distribution during training. We demonstrate empirically, over a variety of domains, that this simple and computationally inexpensive method improves performance under non-stationarity while reducing representation collapse, neuron dormancy, and training instability.

Stable Deep Reinforcement Learning via Isotropic Gaussian Representations

TL;DR

This work analyzes how non-stationarity in deep reinforcement learning destabilizes training and degrades representations. It advocates isotropic Gaussian representations, enforced via the Sketched Isotropic Gaussian Regularization (SIGReg), as a principled prior that yields stable tracking of drifting targets and maximizes entropy under a fixed variance budget. Theoretical analysis shows that isotropy provides uniform contraction across directions while Gaussian tails minimize drift variance, and empirical results across CIFAR-10 shifts, Atari PQN/PPO, and Isaac Gym demonstrate improved stability, reduced neuron dormancy, and higher performance. Overall, shaping representation geometry emerges as a robust, lightweight pathway to stabilizing learning in non-stationary, online RL settings with broad applicability across algorithms and domains.

Abstract

Deep reinforcement learning systems often suffer from unstable training dynamics due to non-stationarity, where learning objectives and data distributions evolve over time. We show that under non-stationary targets, isotropic Gaussian embeddings are provably advantageous. In particular, they induce stable tracking of time-varying targets for linear readouts, achieve maximal entropy under a fixed variance budget, and encourage a balanced use of all representational dimensions--all of which enable agents to be more adaptive and stable. Building on this insight, we propose the use of Sketched Isotropic Gaussian Regularization for shaping representations toward an isotropic Gaussian distribution during training. We demonstrate empirically, over a variety of domains, that this simple and computationally inexpensive method improves performance under non-stationarity while reducing representation collapse, neuron dormancy, and training instability.
Paper Structure (53 sections, 2 theorems, 58 equations, 17 figures, 9 tables)

This paper contains 53 sections, 2 theorems, 58 equations, 17 figures, 9 tables.

Key Result

Theorem 3.1

Assume that $\Sigma_\phi(t) \succ 0$ is constant over time (e.g., enforced by regularization) and that $b_t$ is differentiable. Under gradient flow, the time derivative of $\Gamma$ satisfies

Figures (17)

  • Figure 1: Illustration of two tracking regimes.Left: the norm of the tracking error exhibits non-monotonic behavior and fails to converge, indicating unstable tracking. Right: the tracking error decreases monotonically and converges to zero, corresponding to stable tracking dynamics.
  • Figure 2: Directly shaping a multivariate distribution. SIGReg first projects the embeddings onto a small set of random directions ($p_i$: sketching), producing multiple univariate distributions. Each projection is then matched to the corresponding univariate target distribution.
  • Figure 3: Non-stationary CIFAR-10. Training under repeated label shuffling. The baseline shows poor recovery after each shift, with SIGReg loss spikes, rank collapse, and increased dormancy. Enforcing isotropic Gaussian representations stabilizes training, accelerates recovery, preserves rank, and reduces dormancy.
  • Figure 4: Two Atari-10 games (PQN). Without isotropy regularization, representations exhibit rank collapse, increased neuron dormancy, and early performance saturation. Encouraging isotropic geometry leads to improved representation quality and higher, more stable performance.
  • Figure 5: 2D PCA of embedding covariance over training. Without constraints, representations collapse onto a few dominant principal components. Encouraging isotropic Gaussian structure yields more evenly distributed variance and higher effective dimensionality, reflected in reduced concentration on the leading components (PQN: $[0.4,0.2]\rightarrow[0.9,0.8]$ vs. PQN+SIGReg: $[0.3,0.2]\rightarrow[0.1,0.1]$).
  • ...and 12 more figures

Theorems & Definitions (4)

  • Theorem 3.1: Tracking error dynamics
  • proof : Proof sketch
  • Theorem 2.1: Tracking error dynamics
  • proof