Table of Contents
Fetching ...

On Geometric Structures for Policy Parameterization in Continuous Control

Zhihao Lin

TL;DR

This paper addresses the mismatch between bounded action spaces and traditional unbounded Gaussian policies by introducing Geometric Action Control (GAC), which generates actions directly on the unit sphere using a direction $\boldsymbol{\mu}$ and a learnable concentration $\kappa$. Actions are produced via $\mathbf{a} = r \cdot \mathrm{normalize}\left( w(\kappa) \boldsymbol{\mu} + (1 - w(\kappa)) \boldsymbol{\xi} \right)$ with $w(\kappa) = \sigma(\kappa)$, enabling simple $O(d)$ sampling and reducing parameter counts to $d{+}1$. The authors provide theoretical justification showing that the expected unnormalized action aligns with $\boldsymbol{\mu}$ and that concentration emulates vMF-like behavior without Bessel functions, while empirical results across six MuJoCo tasks demonstrate competitive or superior performance and robust ablations underline the importance of unit normalization and adaptive concentration. The approach eliminates density evaluations and entropy computations, offering a geometrically principled alternative that can yield robust, efficient control in high-dimensional settings and inspires a broader Geometric Simplicity Principle for RL policy design.

Abstract

Standard stochastic policies for continuous control often rely on ad-hoc boundary-enforcing transformations (e.g., tanh) which can distort the underlying optimization landscape and introduce gradient pathologies. While alternative parameterizations on the unit manifold (e.g., directional distributions) are theoretically appealing, their computational complexity (often requiring special functions or rejection sampling) has limited their practical use. We propose a novel, computationally efficient action generation paradigm that preserves the structural benefits of operating on a unit manifold. Our method decomposes the action into a deterministic directional vector and a learnable concentration scalar, enabling efficient interpolation between the target direction and uniform noise on the unit manifold. This design can reduce policy head parameters by nearly 50\% (from $2d$ to $d+1$) and maintains a simple $O(d)$ sampling complexity, avoiding costly sampling procedures. Empirically, our method matches or exceeds state-of-the-art methods on standard continuous control benchmarks, with significant improvements (e.g., +37.6\% and +112\%) on high-dimensional locomotion tasks. Ablation studies confirm that both the unit-norm normalization and the adaptive concentration mechanism are essential to the method's success. These findings suggest that robust, efficient control can be achieved by explicitly respecting the structure of bounded action spaces, rather than relying on complex, unbounded distributions. Code is available in supplementary materials.

On Geometric Structures for Policy Parameterization in Continuous Control

TL;DR

This paper addresses the mismatch between bounded action spaces and traditional unbounded Gaussian policies by introducing Geometric Action Control (GAC), which generates actions directly on the unit sphere using a direction and a learnable concentration . Actions are produced via with , enabling simple sampling and reducing parameter counts to . The authors provide theoretical justification showing that the expected unnormalized action aligns with and that concentration emulates vMF-like behavior without Bessel functions, while empirical results across six MuJoCo tasks demonstrate competitive or superior performance and robust ablations underline the importance of unit normalization and adaptive concentration. The approach eliminates density evaluations and entropy computations, offering a geometrically principled alternative that can yield robust, efficient control in high-dimensional settings and inspires a broader Geometric Simplicity Principle for RL policy design.

Abstract

Standard stochastic policies for continuous control often rely on ad-hoc boundary-enforcing transformations (e.g., tanh) which can distort the underlying optimization landscape and introduce gradient pathologies. While alternative parameterizations on the unit manifold (e.g., directional distributions) are theoretically appealing, their computational complexity (often requiring special functions or rejection sampling) has limited their practical use. We propose a novel, computationally efficient action generation paradigm that preserves the structural benefits of operating on a unit manifold. Our method decomposes the action into a deterministic directional vector and a learnable concentration scalar, enabling efficient interpolation between the target direction and uniform noise on the unit manifold. This design can reduce policy head parameters by nearly 50\% (from to ) and maintains a simple sampling complexity, avoiding costly sampling procedures. Empirically, our method matches or exceeds state-of-the-art methods on standard continuous control benchmarks, with significant improvements (e.g., +37.6\% and +112\%) on high-dimensional locomotion tasks. Ablation studies confirm that both the unit-norm normalization and the adaptive concentration mechanism are essential to the method's success. These findings suggest that robust, efficient control can be achieved by explicitly respecting the structure of bounded action spaces, rather than relying on complex, unbounded distributions. Code is available in supplementary materials.

Paper Structure

This paper contains 33 sections, 1 theorem, 23 equations, 5 figures, 3 tables.

Key Result

Theorem 1

For GAC's spherical mixing operation, the expected unnormalized sample vector lies precisely along the mean direction, scaled by the mixing weight: where $\mathbf{v} = w(\kappa) \boldsymbol{\mu} + (1-w(\kappa)) \boldsymbol{\xi}$ is the unnormalized mixture, $\boldsymbol{\mu} \in \mathbb{S}^{d-1}$ is the mean direction, $\boldsymbol{\xi} \sim \text{Uniform}(\mathbb{S}^{d-1})$ is uniform spherical

Figures (5)

  • Figure 1: Architecture of GAC. State $s$ is processed by a shared backbone, which branches into a direction head producing a unit vector $\boldsymbol{\mu}$, and a concentration head predicting $\kappa$. The final action is generated via spherical mixing, replacing traditional distributional sampling with direct geometric interpolation.
  • Figure 2: Learning curves on (a) Hopper-v4, (b) Walker2d-v4, (c) Pusher-v4, (d) HalfCheetah-v4, (e) Ant-v4, and (f) Humanoid-v4.
  • Figure 3: Ablation study on HalfCheetah-v4. Default GAC ($r=2.5$, adaptive $\kappa$) performs best.
  • Figure A.1: 3D visualization of GAC sample distributions for $\kappa \in \{-2, 0, 0.5, 1\}$. Arrows indicate target direction $\boldsymbol{\mu}$. Colors represent cosine similarity with $\boldsymbol{\mu}$ (blue=low, red=high). Higher $\kappa$ values produce more concentrated distributions.
  • Figure A.2: Distribution of pre-squashed Gaussian samples from a trained SAC policy. Red areas indicate saturated gradients ($|\tanh'(x)| < 0.05$), with 46.4% of samples falling into these regions. Dashed lines show the $[-1, 1]$$\tanh$ boundaries. This mismatch between unbounded Gaussians and bounded action spaces motivates GAC's direct geometric approach.

Theorems & Definitions (3)

  • Theorem 1: Expected Direction Control
  • proof
  • proof : Proof Sketch