Table of Contents
Fetching ...

A Physics-Informed Machine Learning Framework for Safe and Optimal Control of Autonomous Systems

Manan Tayal, Aditya Singh, Shishir Kolathaya, Somil Bansal

TL;DR

This work tackles co-optimizing safety and performance for nonlinear autonomous systems by formulating a state-constrained optimal control problem (SC-OCP) and solving it via an epigraph reformulation. A physics-informed neural network learns the auxiliary value function $\hat{V}$ satisfying the HJB-PDE, while conformal prediction provides high-confidence safety guarantees and performance bounds. The final value function $V_{\theta}$ and policy $\pi_{\theta}$ are recovered by enforcing a safety margin $\delta$ and optimizing over an augmented state, ensuring robust, safe, and near-optimal control. Experiments on 2D boat navigation, 8D pursuer-evader tracking, and 20D multi-agent navigation demonstrate scalable, real-time performance with provable safety guarantees, outperforming CRL and safety-filter baselines in both safety and efficiency.

Abstract

As autonomous systems become more ubiquitous in daily life, ensuring high performance with guaranteed safety is crucial. However, safety and performance could be competing objectives, which makes their co-optimization difficult. Learning-based methods, such as Constrained Reinforcement Learning (CRL), achieve strong performance but lack formal safety guarantees due to safety being enforced as soft constraints, limiting their use in safety-critical settings. Conversely, formal methods such as Hamilton-Jacobi (HJ) Reachability Analysis and Control Barrier Functions (CBFs) provide rigorous safety assurances but often neglect performance, resulting in overly conservative controllers. To bridge this gap, we formulate the co-optimization of safety and performance as a state-constrained optimal control problem, where performance objectives are encoded via a cost function and safety requirements are imposed as state constraints. We demonstrate that the resultant value function satisfies a Hamilton-Jacobi-Bellman (HJB) equation, which we approximate efficiently using a novel physics-informed machine learning framework. In addition, we introduce a conformal prediction-based verification strategy to quantify the learning errors, recovering a high-confidence safety value function, along with a probabilistic error bound on performance degradation. Through several case studies, we demonstrate the efficacy of the proposed framework in enabling scalable learning of safe and performant controllers for complex, high-dimensional autonomous systems.

A Physics-Informed Machine Learning Framework for Safe and Optimal Control of Autonomous Systems

TL;DR

This work tackles co-optimizing safety and performance for nonlinear autonomous systems by formulating a state-constrained optimal control problem (SC-OCP) and solving it via an epigraph reformulation. A physics-informed neural network learns the auxiliary value function satisfying the HJB-PDE, while conformal prediction provides high-confidence safety guarantees and performance bounds. The final value function and policy are recovered by enforcing a safety margin and optimizing over an augmented state, ensuring robust, safe, and near-optimal control. Experiments on 2D boat navigation, 8D pursuer-evader tracking, and 20D multi-agent navigation demonstrate scalable, real-time performance with provable safety guarantees, outperforming CRL and safety-filter baselines in both safety and efficiency.

Abstract

As autonomous systems become more ubiquitous in daily life, ensuring high performance with guaranteed safety is crucial. However, safety and performance could be competing objectives, which makes their co-optimization difficult. Learning-based methods, such as Constrained Reinforcement Learning (CRL), achieve strong performance but lack formal safety guarantees due to safety being enforced as soft constraints, limiting their use in safety-critical settings. Conversely, formal methods such as Hamilton-Jacobi (HJ) Reachability Analysis and Control Barrier Functions (CBFs) provide rigorous safety assurances but often neglect performance, resulting in overly conservative controllers. To bridge this gap, we formulate the co-optimization of safety and performance as a state-constrained optimal control problem, where performance objectives are encoded via a cost function and safety requirements are imposed as state constraints. We demonstrate that the resultant value function satisfies a Hamilton-Jacobi-Bellman (HJB) equation, which we approximate efficiently using a novel physics-informed machine learning framework. In addition, we introduce a conformal prediction-based verification strategy to quantify the learning errors, recovering a high-confidence safety value function, along with a probabilistic error bound on performance degradation. Through several case studies, we demonstrate the efficacy of the proposed framework in enabling scalable learning of safe and performant controllers for complex, high-dimensional autonomous systems.

Paper Structure

This paper contains 33 sections, 3 theorems, 57 equations, 12 figures, 5 tables, 2 algorithms.

Key Result

Theorem 3.1

Let $\mathcal{S}_{\delta}$ be the set of states satisfying $\hat{V}_{\theta}(0, \hat{x}) \leq \delta$, and let $(0, \hat{x_i})_{i=1, \dots, N_s}$ be $N_s$ i.i.d. samples from $\mathcal{S}_{\delta}$. Define $\alpha_{\delta}$ as the safety error rate among these $N_s$ samples for a given $\delta$ leve where $l = \lfloor (N_s+1)\alpha_{\delta} \rfloor$. Then, with the probability of at least $1 - \be

Figures (12)

  • Figure 1: Overview of the proposed approach: The methodology is organized into four steps. The first step involves training the auxiliary value function, $\hat{V}_{\theta}$, using a physics-informed machine learning framework. The second step applies a conformal prediction approach for safety verification of the learned $\hat{V}_{\theta}$. In the third step, the final value function $V_{\theta}$ and the optimal safe and performant policy $\pi_{\theta}$ are inferred. The fourth step quantifies the performance of $V_{\theta}$ through a second conformal prediction procedure.
  • Figure 2: This figure presents a comparative study between all the methods based on our evaluation metrics. The top plot illustrates the mean percentage increase in cumulative cost relative to our method for each baseline, demonstrating that our approach consistently incurs lower costs, with the gap widening as system complexity grows. The bottom plot depicts the safety rates, showing that our method maintains a $100\%$ safety rate, while baselines that encourage safety rather than enforcing it (like MPPI and C-SAC) achieve lower rates. MPPI-CBF also attains $100\%$ safety but at the expense of performance. Overall, our method uniquely balances both safety and performance, whereas the baselines compromise on at least one aspect.
  • Figure 3: Trajectories from two distinct initial states are shown, with dark grey circles representing obstacles and the green dot indicating the goal at $[1.5, 0]^T$. Notably, our method is the only one that successfully approaches the goal while adhering to safety constraints.
  • Figure 4: Trajectories from two distinct initial states are depicted, with dark grey circles representing obstacles and purple trajectories indicating the evader's path, with arrows showing its direction of motion. Our method successfully tracks the evader while avoiding collisions, whereas all other methods either fail to maintain safety, struggle to track the evader or both
  • Figure 5: Snapshots of multi-agent navigation trajectories at different times using the proposed method. Agents are represented as circles with radius $R$, indicating the minimum safe distance they must maintain from each other. Smaller dots mark their respective goals. The trajectories show that agents proactively maintain long-horizon safety by adjusting their paths to avoid close encounters, rather than enforcing safety reactively, which could lead to suboptimal behaviors. Finally, the agents reach their respective goals within the specified time horizon.
  • ...and 7 more figures

Theorems & Definitions (5)

  • Theorem 3.1: Safety Verification Using Conformal Prediction
  • Theorem 3.2: Performance Quantification Using Conformal Prediction
  • proof
  • lemma 1: Split Conformal Prediction angelopoulos2022gentle
  • proof