Table of Contents
Fetching ...

ADARL: Adaptive Low-Rank Structures for Robust Policy Learning under Uncertainty

Chenliang Li, Junyu Leng, Jiaxiang Li, Youbang Sun, Shixiang Chen, Shahin Shahrampour, Alfredo Garcia

TL;DR

Empirical results demonstrate that adaptive low-rank policy representations provide an efficient and principled alternative for robust RL under model uncertainty, and highlight that adaptive low-rank policy representations provide an efficient and principled alternative for robust RL under model uncertainty.

Abstract

Robust reinforcement learning (Robust RL) seeks to handle epistemic uncertainty in environment dynamics, but existing approaches often rely on nested min--max optimization, which is computationally expensive and yields overly conservative policies. We propose \textbf{Adaptive Rank Representation (AdaRL)}, a bi-level optimization framework that improves robustness by aligning policy complexity with the intrinsic dimension of the task. At the lower level, AdaRL performs policy optimization under fixed-rank constraints with dynamics sampled from a Wasserstein ball around a centroid model. At the upper level, it adaptively adjusts the rank to balance the bias--variance trade-off, projecting policy parameters onto a low-rank manifold. This design avoids solving adversarial worst-case dynamics while ensuring robustness without over-parameterization. Empirical results on MuJoCo continuous control benchmarks demonstrate that AdaRL not only consistently outperforms fixed-rank baselines (e.g., SAC) and state-of-the-art robust RL methods (e.g., RNAC, Parseval), but also converges toward the intrinsic rank of the underlying tasks. These results highlight that adaptive low-rank policy representations provide an efficient and principled alternative for robust RL under model uncertainty.

ADARL: Adaptive Low-Rank Structures for Robust Policy Learning under Uncertainty

TL;DR

Empirical results demonstrate that adaptive low-rank policy representations provide an efficient and principled alternative for robust RL under model uncertainty, and highlight that adaptive low-rank policy representations provide an efficient and principled alternative for robust RL under model uncertainty.

Abstract

Robust reinforcement learning (Robust RL) seeks to handle epistemic uncertainty in environment dynamics, but existing approaches often rely on nested min--max optimization, which is computationally expensive and yields overly conservative policies. We propose \textbf{Adaptive Rank Representation (AdaRL)}, a bi-level optimization framework that improves robustness by aligning policy complexity with the intrinsic dimension of the task. At the lower level, AdaRL performs policy optimization under fixed-rank constraints with dynamics sampled from a Wasserstein ball around a centroid model. At the upper level, it adaptively adjusts the rank to balance the bias--variance trade-off, projecting policy parameters onto a low-rank manifold. This design avoids solving adversarial worst-case dynamics while ensuring robustness without over-parameterization. Empirical results on MuJoCo continuous control benchmarks demonstrate that AdaRL not only consistently outperforms fixed-rank baselines (e.g., SAC) and state-of-the-art robust RL methods (e.g., RNAC, Parseval), but also converges toward the intrinsic rank of the underlying tasks. These results highlight that adaptive low-rank policy representations provide an efficient and principled alternative for robust RL under model uncertainty.

Paper Structure

This paper contains 18 sections, 1 theorem, 30 equations, 8 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Bias-Variance Trade-off of Rank-r Approximation: Assume the ground-truth dynamics are given by $\mathcal{P}^\circ$ and Assumptions assumption:Picard and assumption:Lipschitz hold. Let $(\theta^\circ,\omega^\circ)$ denote the solution of the optimization problem defined by Eq. upper, Eq. lower when t

Figures (8)

  • Figure 1: Comparison of robust RL and the proposed AdaRL framework. Robust RL relies on repeatedly solving a nested min--max problem, while AdaRL formulates training as a bi-level optimization that alternates between policy optimization and adaptive rank adjustment to balance the bias--variance trade-off under epistemic uncertainty.
  • Figure 2: Performance of policy models under high model uncertainty in Walker2d-v3 (Left) and Hopper-v3 (Right). Results indicate that extremely low-rank representations lead to high bias, while overly high-rank models incur large approximation errors due to transition samples drawn from uncertain dynamics.
  • Figure 3: Training performance on MuJoCo tasks. The proposed AdaRL consistently outperforms standard SAC baselines under model uncertainty. The red dashed vertical lines indicate the boundaries between different iteration intervals.
  • Figure 4: We plot the estimated rank from AdaRL throughout training. The intrinsic rank refers to the value identified by tiwari2025geometry. Left: Walker2d. Right: Hopper.
  • Figure 5: To impose the low-rank constraint, we insert an intermediate linear layer (without activation functions or bias) between the original two layers. This layer acts as a bottleneck that enforces a low-rank factorization of the weight matrix via SVD approximation.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Theorem 1