Table of Contents
Fetching ...

C-STEP: Continuous Space-Time Empowerment for Physics-informed Safe Reinforcement Learning of Mobile Agents

Guihlerme Daubt, Adrian Redder

Abstract

Safe navigation in complex environments remains a central challenge for reinforcement learning (RL) in robotics. This paper introduces Continuous Space-Time Empowerment for Physics-informed (C-STEP) safe RL, a novel measure of agent-centric safety tailored to deterministic, continuous domains. This measure can be used to design physics-informed intrinsic rewards by augmenting positive navigation reward functions. The reward incorporates the agents internal states (e.g., initial velocity) and forward dynamics to differentiate safe from risky behavior. By integrating C-STEP with navigation rewards, we obtain an intrinsic reward function that jointly optimizes task completion and collision avoidance. Numerical results demonstrate fewer collisions, reduced proximity to obstacles, and only marginal increases in travel time. Overall, C-STEP offers an interpretable, physics-informed approach to reward shaping in RL, contributing to safety for agentic mobile robotic systems.

C-STEP: Continuous Space-Time Empowerment for Physics-informed Safe Reinforcement Learning of Mobile Agents

Abstract

Safe navigation in complex environments remains a central challenge for reinforcement learning (RL) in robotics. This paper introduces Continuous Space-Time Empowerment for Physics-informed (C-STEP) safe RL, a novel measure of agent-centric safety tailored to deterministic, continuous domains. This measure can be used to design physics-informed intrinsic rewards by augmenting positive navigation reward functions. The reward incorporates the agents internal states (e.g., initial velocity) and forward dynamics to differentiate safe from risky behavior. By integrating C-STEP with navigation rewards, we obtain an intrinsic reward function that jointly optimizes task completion and collision avoidance. Numerical results demonstrate fewer collisions, reduced proximity to obstacles, and only marginal increases in travel time. Overall, C-STEP offers an interpretable, physics-informed approach to reward shaping in RL, contributing to safety for agentic mobile robotic systems.
Paper Structure (21 sections, 1 theorem, 11 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 21 sections, 1 theorem, 11 equations, 5 figures, 5 tables, 1 algorithm.

Key Result

Proposition 1

Suppose that $f$ is Lipschitz continuous, then the CST-Empowerment of eq:ODE is given by for all $x\in \mathbb{R}^n$ with $\lambda(\cdot)$ the $n$-dim. volume on $\mathbb{R}^n$

Figures (5)

  • Figure 1: Point maze navigation. Green spheres are agents, red is the goal, and yellow is the start. The empowered agent prefers the safer, wider path.
  • Figure 2: Visualization of the sampling-based reachable set approximation in 2D for different initial velocities. The blue areas indicate total reachable areas; red areas represent the approximated terminal set volume.
  • Figure 3: Top-down view of the PyBullet simulation environment. The agent starts in the yellow region and navigates to the green goal region. The maroon-colored obstacle's width and position are randomized in each episode, indicated by the red arrows, requiring adaptive navigation.
  • Figure 4: Average reward per episode for the empowered agent ($c=1$). (Evaluated every $10^3$ training steps.)
  • Figure 5: Average time spent under different distance thresholds for empowered (blue lines) and unempowered (orange line) agents.

Theorems & Definitions (6)

  • Definition 1: State-Dependent Empowerment salge2014empowerment
  • Definition 2: Deterministic Agent Empowerment
  • Definition 3: CST-Empowerment
  • Proposition 1
  • proof
  • Definition 4: Empowered Navigation Reward Function