Table of Contents
Fetching ...

Physics-Informed Policy Optimization via Analytic Dynamics Regularization

Namai Chandra, Liu Mohan, Zhihao Gu, Lin Wang

Abstract

Reinforcement learning (RL) has achieved strong performance in robotic control; however, state-of-the-art policy learning methods, such as actor-critic methods, still suffer from high sample complexity and often produce physically inconsistent actions. This limitation stems from neural policies implicitly rediscovering complex physics from data alone, despite accurate dynamics models being readily available in simulators. In this paper, we introduce a novel physics-informed RL framework, called PIPER, that seamlessly integrates physical constraints directly into neural policy optimization with analytical soft physics constraints. At the core of our method is the integration of a differentiable Lagrangian residual as a regularization term within the actor's objective. This residual, extracted from a robot's simulator description, subtly biases policy updates towards dynamically consistent solutions. Crucially, this physics integration is realized through an additional loss term during policy optimization, requiring no alterations to existing simulators or core RL algorithms. Extensive experiments demonstrate that our method significantly improves learning efficiency, stability, and control accuracy, establishing a new paradigm for efficient and physically consistent robotic control.

Physics-Informed Policy Optimization via Analytic Dynamics Regularization

Abstract

Reinforcement learning (RL) has achieved strong performance in robotic control; however, state-of-the-art policy learning methods, such as actor-critic methods, still suffer from high sample complexity and often produce physically inconsistent actions. This limitation stems from neural policies implicitly rediscovering complex physics from data alone, despite accurate dynamics models being readily available in simulators. In this paper, we introduce a novel physics-informed RL framework, called PIPER, that seamlessly integrates physical constraints directly into neural policy optimization with analytical soft physics constraints. At the core of our method is the integration of a differentiable Lagrangian residual as a regularization term within the actor's objective. This residual, extracted from a robot's simulator description, subtly biases policy updates towards dynamically consistent solutions. Crucially, this physics integration is realized through an additional loss term during policy optimization, requiring no alterations to existing simulators or core RL algorithms. Extensive experiments demonstrate that our method significantly improves learning efficiency, stability, and control accuracy, establishing a new paradigm for efficient and physically consistent robotic control.
Paper Structure (16 sections, 18 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 16 sections, 18 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: Comparison of architecture and performance between SAC and Physical-Informed SAC. (a) The standard SAC framework relying solely on the MuJoCo simulator, resulting in High Error in the robotic arm task. (b) The Physical-Informed SAC framework which incorporates an Analytic Physics & Constraints module, leading to Low Error and precise control.
  • Figure 2: Overview of the physics-informed actor–critic reinforcement learning framework. The actor policy generates control actions based on the current system state, while the critic network evaluates action quality by estimating the value function. A PINN, coupled with a high-fidelity simulator (MuJoCo), embeds prior physical laws and system dynamics into the learning loop, providing physically consistent state transitions and constraints. The interaction between data-driven policy optimization and physics-based modeling improves training stability, sample efficiency, and generalization performance compared to purely data-driven reinforcement learning approaches.
  • Figure 3: Qualitative comparison of final end-effector configurations in the FetchReach-v4 environment. The green background indicates the simulated MuJoCo workspace. Top row: Baseline policies often exhibit steady-state error or drift due to lack of structural knowledge. Bottom row: PIPER policies achieve tighter convergence to the target geometry by exploiting the null space of the manipulator dynamics.
  • Figure 4: FetchReach Learning Dynamics. Comparison of baseline and our physics-informed variants (shaded regions represent $\pm 1$ standard deviation across 5 random seeds). Our PIPER-SAC (orange) demonstrates superior convergence speed by exploiting the inertial null-space. Zoom in to view better details.
  • Figure 5: Performance Analysis across Contact-Rich Environments. Comparison of Baseline TQC+HER vs. PIPER-TQC. Zoom in to view better details.
  • ...and 1 more figures