Physics Informed Viscous Value Representations

Hrishikesh Viswanath; Juanwu Lu; S. Talha Bukhari; Damon Conover; Ziran Wang; Aniket Bera

Physics Informed Viscous Value Representations

Hrishikesh Viswanath, Juanwu Lu, S. Talha Bukhari, Damon Conover, Ziran Wang, Aniket Bera

TL;DR

This work proposes a physics-informed regularization derived from the viscosity solution of the Hamilton-Jacobi-Bellman (HJB) equation, leveraging the Feynman-Kac theorem to recast the PDE solution as an expectation, enabling a tractable Monte Carlo estimation of the objective that avoids numerical instability in higher-order gradients.

Abstract

Offline goal-conditioned reinforcement learning (GCRL) learns goal-conditioned policies from static pre-collected datasets. However, accurate value estimation remains a challenge due to the limited coverage of the state-action space. Recent physics-informed approaches have sought to address this by imposing physical and geometric constraints on the value function through regularization defined over first-order partial differential equations (PDEs), such as the Eikonal equation. However, these formulations can often be ill-posed in complex, high-dimensional environments. In this work, we propose a physics-informed regularization derived from the viscosity solution of the Hamilton-Jacobi-Bellman (HJB) equation. By providing a physics-based inductive bias, our approach grounds the learning process in optimal control theory, explicitly regularizing and bounding updates during value iterations. Furthermore, we leverage the Feynman-Kac theorem to recast the PDE solution as an expectation, enabling a tractable Monte Carlo estimation of the objective that avoids numerical instability in higher-order gradients. Experiments demonstrate that our method improves geometric consistency, making it broadly applicable to navigation and high-dimensional, complex manipulation tasks. Open-source codes are available at https://github.com/HrishikeshVish/phys-fk-value-GCRL.

Physics Informed Viscous Value Representations

TL;DR

Abstract

Paper Structure (18 sections, 22 equations, 8 figures, 6 tables)

This paper contains 18 sections, 22 equations, 8 figures, 6 tables.

Introduction
Related Works
Preliminaries
Learning Value Representations
Solution to Optimal Control
Linearization
Value Function Regularization
Experiments
Results
Conclusion
Notations
Additional Derivations
Solution to Optimal Control
Derivation of Linearized HJB Hamiltonian
Feynman-Kac Formula Yields an Upper Bound on the Optimal State Value
...and 3 more sections

Figures (8)

Figure 1: Efficient manipulation via viscosity-based value functions. By integrating viscosity based geometric regularization on the value function for offline goal-conditioned reinforcement learning, our method successfully executes complex manipulation tasks where the Eikonal baseline giammarino2025physics struggles. As shown (Bottom), our approach yields direct, stable trajectories across various tasks, overcoming the erratic and suboptimal paths generated by the baseline (Top).
Figure 2: Qualitative value contour ablation on PointMaze-Large.Col 1 (Original): Baseline (\ref{['fig:orig_base']}) fails globally; ours (\ref{['fig:orig_ours']}) recovers functional structure. Col 2 (VIB): Baseline (\ref{['fig:vib_base']}) exhibits severe jitter; ours (\ref{['fig:vib_ours']}) significantly stabilizes geometry. Col 3 (Dual): Baseline (\ref{['fig:dual_base']}) suffers geometric collapse with contours parallel to paths; ours (\ref{['fig:dual_ours']}) enforces correct path-orthogonality. Col 4 (Eikonal): While Eikonal constraints (\ref{['fig:vib_eik']}, \ref{['fig:dual_eik']}) reduce jitter, they incorrectly align contours parallel to walls. Our viscous formulation strictly preserves orthogonal, geodesic-aligned contours.
Figure 3: Toy example with a single obstacle (blue square) in an arena, with the goal state at the bottom: We analytically solve the exact Eikonal form from giammarino2025physics and our stochastic HJB PDE (Feynman--Kac) on a toy problem. The second-order HJB PDE produces contours with larger magnitude and curvature near the obstacle, with vectors directed away from it.
Figure 4: Comparison of action probability distributions (pink fans) along a trajectory.(a) Ours: The advantage distributions align parallel to the walls, correctly identifying the safe path through corridors. (b) Eikonal: The distributions often point directly into walls when the goal is geometrically behind them.
Figure 5: Benchmark Suite from OGBench We evaluate our method across a wide spectrum of offline GCRL tasks, ranging from standard geometric navigation (point/ant/humanoid-maze) and high-dimensional manipulation (scene-play, puzzle), to the highly stochastic, pixel-based physics of powderworld. This diversity tests the agent's ability to handle varying degrees of state dimensionality, transition stochasticity, and dynamic complexity.
...and 3 more figures

Physics Informed Viscous Value Representations

TL;DR

Abstract

Physics Informed Viscous Value Representations

Authors

TL;DR

Abstract

Table of Contents

Figures (8)