On Geometric Structures for Policy Parameterization in Continuous Control
Zhihao Lin
TL;DR
This paper addresses the mismatch between bounded action spaces and traditional unbounded Gaussian policies by introducing Geometric Action Control (GAC), which generates actions directly on the unit sphere using a direction $\boldsymbol{\mu}$ and a learnable concentration $\kappa$. Actions are produced via $\mathbf{a} = r \cdot \mathrm{normalize}\left( w(\kappa) \boldsymbol{\mu} + (1 - w(\kappa)) \boldsymbol{\xi} \right)$ with $w(\kappa) = \sigma(\kappa)$, enabling simple $O(d)$ sampling and reducing parameter counts to $d{+}1$. The authors provide theoretical justification showing that the expected unnormalized action aligns with $\boldsymbol{\mu}$ and that concentration emulates vMF-like behavior without Bessel functions, while empirical results across six MuJoCo tasks demonstrate competitive or superior performance and robust ablations underline the importance of unit normalization and adaptive concentration. The approach eliminates density evaluations and entropy computations, offering a geometrically principled alternative that can yield robust, efficient control in high-dimensional settings and inspires a broader Geometric Simplicity Principle for RL policy design.
Abstract
Standard stochastic policies for continuous control often rely on ad-hoc boundary-enforcing transformations (e.g., tanh) which can distort the underlying optimization landscape and introduce gradient pathologies. While alternative parameterizations on the unit manifold (e.g., directional distributions) are theoretically appealing, their computational complexity (often requiring special functions or rejection sampling) has limited their practical use. We propose a novel, computationally efficient action generation paradigm that preserves the structural benefits of operating on a unit manifold. Our method decomposes the action into a deterministic directional vector and a learnable concentration scalar, enabling efficient interpolation between the target direction and uniform noise on the unit manifold. This design can reduce policy head parameters by nearly 50\% (from $2d$ to $d+1$) and maintains a simple $O(d)$ sampling complexity, avoiding costly sampling procedures. Empirically, our method matches or exceeds state-of-the-art methods on standard continuous control benchmarks, with significant improvements (e.g., +37.6\% and +112\%) on high-dimensional locomotion tasks. Ablation studies confirm that both the unit-norm normalization and the adaptive concentration mechanism are essential to the method's success. These findings suggest that robust, efficient control can be achieved by explicitly respecting the structure of bounded action spaces, rather than relying on complex, unbounded distributions. Code is available in supplementary materials.
