Table of Contents
Fetching ...

Hybrid Energy-Aware Reward Shaping: A Unified Lightweight Physics-Guided Methodology for Policy Optimization

Qijun Liao, Jue Yang, Yiting Kang, Xinxin Zhao, Yong Zhang, Mingan Zhao

Abstract

Deep reinforcement learning excels in continuous control but often requires extensive exploration, while physics-based models demand complete equations and suffer cubic complexity. This study proposes Hybrid Energy-Aware Reward Shaping (H-EARS), unifying potential-based reward shaping with energy-aware action regularization. H-EARS constrains action magnitude while balancing task-specific and energy-based potentials via functional decomposition, achieving linear complexity O(n) by capturing dominant energy components without full dynamics. We establish a theoretical foundation including: (1) functional independence for separate task/energy optimization; (2) energy-based convergence acceleration; (3) convergence guarantees under function approximation; and (4) approximate potential error bounds. Lyapunov stability connections are analyzed as heuristic guides. Experiments across baselines show improved convergence, stability, and energy efficiency. Vehicle simulations validate applicability in safety-critical domains under extreme conditions. Results confirm that integrating lightweight physics priors enhances model-free RL without complete system models, enabling transfer from lab research to industrial applications.

Hybrid Energy-Aware Reward Shaping: A Unified Lightweight Physics-Guided Methodology for Policy Optimization

Abstract

Deep reinforcement learning excels in continuous control but often requires extensive exploration, while physics-based models demand complete equations and suffer cubic complexity. This study proposes Hybrid Energy-Aware Reward Shaping (H-EARS), unifying potential-based reward shaping with energy-aware action regularization. H-EARS constrains action magnitude while balancing task-specific and energy-based potentials via functional decomposition, achieving linear complexity O(n) by capturing dominant energy components without full dynamics. We establish a theoretical foundation including: (1) functional independence for separate task/energy optimization; (2) energy-based convergence acceleration; (3) convergence guarantees under function approximation; and (4) approximate potential error bounds. Lyapunov stability connections are analyzed as heuristic guides. Experiments across baselines show improved convergence, stability, and energy efficiency. Vehicle simulations validate applicability in safety-critical domains under extreme conditions. Results confirm that integrating lightweight physics priors enhances model-free RL without complete system models, enabling transfer from lab research to industrial applications.
Paper Structure (41 sections, 8 theorems, 77 equations, 8 figures, 8 tables, 1 algorithm)

This paper contains 41 sections, 8 theorems, 77 equations, 8 figures, 8 tables, 1 algorithm.

Key Result

Lemma 2.1

The H-EARS reward function admits a functional decomposition into two independent components operating on disjoint domains: with $\text{Dom}(\mathcal{F}_{\text{pot}}) \cap \text{Dom}(\mathcal{F}_{\text{reg}}) = \emptyset$. By standard PBRS theory ng1999policy, for fixed $\lambda$, any modification to $\Phi$ yields equivalent optimal policies. Conversely, adjusting $\lambda$ directly controls poli

Figures (8)

  • Figure 1: Benchmark performance comparison
  • Figure 2: Ablation performance in Ant-v5 and Hopper-v5
  • Figure 3: Vehicle simulation model in Trucksim
  • Figure 4: Training road height and adhesion coefficient variation parameter settings
  • Figure 5: Test road height and adhesion coefficient variation parameter settings
  • ...and 3 more figures

Theorems & Definitions (20)

  • Lemma 2.1: Functional Independence of Shaping and Regularization
  • proof
  • Theorem 2.2: Regularization as Stability Enforcement
  • proof
  • Remark 2.3: Practical System Classification
  • Theorem 2.4: Energy-Based Convergence Acceleration via Mechanical Stability
  • proof
  • Remark 2.5: Physical Interpretation and Scope
  • Remark 2.6: Physical Self-Consistency of Energy Potentials
  • Proposition 2.7: Dual-Potential Decomposition Necessity
  • ...and 10 more