Table of Contents
Fetching ...

A Representation Optimization Dichotomy, Lie-Algebraic Policy Optimization

Sooraj KC, Vivek Mishra

Abstract

Structured reinforcement learning and stochastic optimization often involve parameters evolving on matrix Lie groups such as rotations and rigid-body transformations. We establish a representation-optimization dichotomy for Lie-algebra-parameterized Gaussian policy objectives in the Lie Group MDP class: the gradient Lipschitz constant L(R), governing step size, convergence, and sample complexity of first-order methods, depends only on the algebraic type of g, uniformly over all objectives, independent of reward or transition structure. Specifically, L = O(1) for compact g (e.g., so(n), su(n)), and L = Theta(exp(2R)) for g = gl(n), with O(exp(2R)) for all algebras with a hyperbolic element. A key lower bound shows this exponential growth cannot be canceled by interaction between the exponential map and the objective, making the dichotomy intrinsic to the algebra.This yields an algorithmic consequence: for compact algebras, radius-independent smoothness enables O(1/sqrt(T)) convergence using an O(n^2 J) Lie-algebraic projection step instead of O(d_g^3) Fisher inversion. A Kantorovich alignment bound alpha >= 2 kappa / (kappa + 1) provides a computable condition under which this projection approximates natural gradient updates. Experiments on SO(3)^J and SE(3) confirm the theory: constant smoothness for compact algebras, polynomial growth for SE(3), and alignment across condition regimes. The projection step achieves 1.1-1.7x speedup over Cholesky-based Fisher inversion, with increasing gains at larger scales.

A Representation Optimization Dichotomy, Lie-Algebraic Policy Optimization

Abstract

Structured reinforcement learning and stochastic optimization often involve parameters evolving on matrix Lie groups such as rotations and rigid-body transformations. We establish a representation-optimization dichotomy for Lie-algebra-parameterized Gaussian policy objectives in the Lie Group MDP class: the gradient Lipschitz constant L(R), governing step size, convergence, and sample complexity of first-order methods, depends only on the algebraic type of g, uniformly over all objectives, independent of reward or transition structure. Specifically, L = O(1) for compact g (e.g., so(n), su(n)), and L = Theta(exp(2R)) for g = gl(n), with O(exp(2R)) for all algebras with a hyperbolic element. A key lower bound shows this exponential growth cannot be canceled by interaction between the exponential map and the objective, making the dichotomy intrinsic to the algebra.This yields an algorithmic consequence: for compact algebras, radius-independent smoothness enables O(1/sqrt(T)) convergence using an O(n^2 J) Lie-algebraic projection step instead of O(d_g^3) Fisher inversion. A Kantorovich alignment bound alpha >= 2 kappa / (kappa + 1) provides a computable condition under which this projection approximates natural gradient updates. Experiments on SO(3)^J and SE(3) confirm the theory: constant smoothness for compact algebras, polynomial growth for SE(3), and alignment across condition regimes. The projection step achieves 1.1-1.7x speedup over Cholesky-based Fisher inversion, with increasing gains at larger scales.

Paper Structure

This paper contains 74 sections, 18 theorems, 62 equations, 5 figures, 14 tables, 1 algorithm.

Key Result

Lemma 3.2

Consider Gaussian policies eq:gaussian_policy with features $\left\lVert \Phi_k(s)\right\rVert_F\le B_\Phi$, rewards $|r(s,a)|\le R_{\max}$, and parameters constrained to $\|\theta\|_F \le R_0$. Under Assumption ass:concentrability and Assumption ass:bounded_actions, Assumption ass:smoothness holds where the exponential factor is Explicit expressions appear in Section sec:appendix_smoothness.

Figures (5)

  • Figure 1: Left: Fisher--metric alignment histogram (200 measurements); vertical lines show empirical mean (solid) and $\kappa$-based bound (dashed). Right: Isotropy metrics during training ($J=10$, 5 seeds): (a) $\varepsilon_F(t)$ stays below $0.3$, (b) $\kappa(t) \in [1.9, 2.8]$, (c) alignment exceeds theoretical bound throughout.
  • Figure 2: Left: Controlled anisotropy---alignment vs. $\kappa$ (a) and return vs. $\varepsilon_F$ (b). Right: Sample efficiency---Lie policy (blue) vs. ambient (orange); shaded: $\pm 1$ std.
  • Figure 3: Log--log convergence rate. Left: Deterministic (quadratic on $\mathfrak{so}(3)^{10}$, $d_{\mathfrak{g}}=30$), slope $-0.98$. Right: Stochastic quadratic proxy ($d_{\mathfrak{g}}=30$, Gaussian noise $\sigma_g=1$), slope $-0.52 \pm 0.08$. Both consistent with compact-algebra predictions (Theorem \ref{['thm:dichotomy']} and Corollary \ref{['cor:convergence_lpg']}).
  • Figure 4: Effect of radius projection on $\mathrm{SE}(3)$ (5 seeds, shaded $\pm 1\sigma$). Left: Without projection ($B_\theta = \infty$), the parameter norm $\|\theta\|_F$ grows unboundedly (reaching ${\sim}18$); with projection ($B_\theta = 2$), it remains bounded at ${\sim}1.85$. Right: The theoretical gradient Lipschitz constant $L(R) = \mathcal{O}(R^2)$ for $\mathfrak{se}(3)$ (polynomial, not exponential, since $\mathfrak{se}(3)$ has no hyperbolic elements), computed from the observed $\|\theta\|_F$ trajectories. Without projection, $L$ grows roughly $18^2/1^2 = 324$-fold over training; with projection, $L$ stays within a constant factor of its initial value.
  • Figure 5: Left: Convergence curves---LPG (blue), ambient PG (orange), natural gradient (green); shaded $\pm 1\sigma$ over 5 seeds. Right: Return vs. wall-clock time (log $x$-axis); LPG reaches ambient-PG's final return in ${\sim}1$ s vs. ${>}40$ s for natural gradient, reflecting the $\mathcal{O}(n^2 J)$ projection cost vs. $\mathcal{O}(d_{\mathfrak{g}}^3)$ Fisher inversion (Table \ref{['tab:method_comparison']}; see note on timing methodology).

Theorems & Definitions (54)

  • Lemma 3.2: Smoothness of RL objectives
  • Proof 1
  • Definition 4.1: Lie Group MDP
  • Remark 4.2: Relationship to prior MDP frameworks
  • Proposition 4.3: Intrinsic dimension reduction
  • Proposition 4.4: Losslessness of Lie-algebraic restriction
  • Proposition 5.3: Block-diagonal Fisher structure for $\mathrm{SO}(3)^J$
  • Proof 2
  • Remark 5.4: Estimating $\alpha$ from data
  • Theorem 6.1: Representation--Optimization Dichotomy
  • ...and 44 more