C-STEP: Continuous Space-Time Empowerment for Physics-informed Safe Reinforcement Learning of Mobile Agents

Guihlerme Daubt; Adrian Redder

C-STEP: Continuous Space-Time Empowerment for Physics-informed Safe Reinforcement Learning of Mobile Agents

Guihlerme Daubt, Adrian Redder

Abstract

Safe navigation in complex environments remains a central challenge for reinforcement learning (RL) in robotics. This paper introduces Continuous Space-Time Empowerment for Physics-informed (C-STEP) safe RL, a novel measure of agent-centric safety tailored to deterministic, continuous domains. This measure can be used to design physics-informed intrinsic rewards by augmenting positive navigation reward functions. The reward incorporates the agents internal states (e.g., initial velocity) and forward dynamics to differentiate safe from risky behavior. By integrating C-STEP with navigation rewards, we obtain an intrinsic reward function that jointly optimizes task completion and collision avoidance. Numerical results demonstrate fewer collisions, reduced proximity to obstacles, and only marginal increases in travel time. Overall, C-STEP offers an interpretable, physics-informed approach to reward shaping in RL, contributing to safety for agentic mobile robotic systems.

C-STEP: Continuous Space-Time Empowerment for Physics-informed Safe Reinforcement Learning of Mobile Agents

Abstract

Paper Structure (21 sections, 1 theorem, 11 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 21 sections, 1 theorem, 11 equations, 5 figures, 5 tables, 1 algorithm.

INTRODUCTION
Background
Information Theory
Reinforcement Learning
Related Work
C-STEP: Continuous Space-Time Empowerment
Rethinking Empowerment for Deterministic Systems
From Empowerment to Safe RL Rewards
Sampling-based Approximation and Hyperparameters
Numerical Experiments
Point Maze Environment
Drone PyBullet Environment
Discussion
On Model Free Learning
Beyond Deterministic Dynamics
...and 6 more sections

Key Result

Proposition 1

Suppose that $f$ is Lipschitz continuous, then the CST-Empowerment of eq:ODE is given by for all $x\in \mathbb{R}^n$ with $\lambda(\cdot)$ the $n$-dim. volume on $\mathbb{R}^n$

Figures (5)

Figure 1: Point maze navigation. Green spheres are agents, red is the goal, and yellow is the start. The empowered agent prefers the safer, wider path.
Figure 2: Visualization of the sampling-based reachable set approximation in 2D for different initial velocities. The blue areas indicate total reachable areas; red areas represent the approximated terminal set volume.
Figure 3: Top-down view of the PyBullet simulation environment. The agent starts in the yellow region and navigates to the green goal region. The maroon-colored obstacle's width and position are randomized in each episode, indicated by the red arrows, requiring adaptive navigation.
Figure 4: Average reward per episode for the empowered agent ($c=1$). (Evaluated every $10^3$ training steps.)
Figure 5: Average time spent under different distance thresholds for empowered (blue lines) and unempowered (orange line) agents.

Theorems & Definitions (6)

Definition 1: State-Dependent Empowerment salge2014empowerment
Definition 2: Deterministic Agent Empowerment
Definition 3: CST-Empowerment
Proposition 1
proof
Definition 4: Empowered Navigation Reward Function

C-STEP: Continuous Space-Time Empowerment for Physics-informed Safe Reinforcement Learning of Mobile Agents

Abstract

C-STEP: Continuous Space-Time Empowerment for Physics-informed Safe Reinforcement Learning of Mobile Agents

Authors

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (6)