Table of Contents
Fetching ...

Reachable Polyhedral Marching (RPM): An Exact Analysis Tool for Deep-Learned Control Systems

Joseph A. Vincent, Mac Schwager

TL;DR

This work tackles safety verification and stability analysis for dynamical systems that incorporate learned components by introducing Reachable Polyhedral Marching (RPM), an exact method to convert ReLU networks into explicit piecewise-affine (PWA) representations. RPM enumerates affine regions through a connected walk, yielding an exact forward and backward reachability analysis without Lyapunov functions, and enabling computation of control invariant sets and regions of attraction (ROAs) for neural-network dynamics. The authors prove correctness via an activation-flipping theorem, demonstrate accelerated backward reachability by exploiting connectivity (including a homeomorphism condition), and validate the approach on learned van der Pol oscillator, pendulum, and image-based taxi control, achieving substantial speedups and enabling nonconvex ROAs. They also introduce a practical test for ReLU network invertibility (homeomorphism) and show how RPM can scale to large PWA representations, especially when combined with tools like MPT3. The results highlight the method’s potential to provide exact, constructive safety and stability guarantees for learned control systems in robotics and autonomous platforms.

Abstract

Neural networks are increasingly used in robotics as policies, state transition models, state estimation models, or all of the above. With these components being learned from data, it is important to be able to analyze what behaviors were learned and how this affects closed-loop performance. In this paper we take steps toward this goal by developing methods for computing control invariant sets and regions of attraction (ROAs) of dynamical systems represented as neural networks. We focus our attention on feedforward neural networks with the rectified linear unit (ReLU) activation, which are known to implement continuous piecewise-affine (PWA) functions. We describe the Reachable Polyhedral Marching (RPM) algorithm for enumerating the affine pieces of a neural network through an incremental connected walk. We then use this algorithm to compute exact forward and backward reachable sets, from which we provide methods for computing control invariant sets and ROAs. Our approach is unique in that we find these sets incrementally, without Lyapunov-based tools. In our examples we demonstrate the ability of our approach to find non-convex control invariant sets and ROAs on tasks with learned van der Pol oscillator and pendulum models. Further, we provide an accelerated algorithm for computing ROAs that leverages the incremental and connected enumeration of affine regions that RPM provides. We show this acceleration to lead to a 15x speedup in our examples. Finally, we apply our methods to find a set of states that are stabilized by an image-based controller for an aircraft runway control problem.

Reachable Polyhedral Marching (RPM): An Exact Analysis Tool for Deep-Learned Control Systems

TL;DR

This work tackles safety verification and stability analysis for dynamical systems that incorporate learned components by introducing Reachable Polyhedral Marching (RPM), an exact method to convert ReLU networks into explicit piecewise-affine (PWA) representations. RPM enumerates affine regions through a connected walk, yielding an exact forward and backward reachability analysis without Lyapunov functions, and enabling computation of control invariant sets and regions of attraction (ROAs) for neural-network dynamics. The authors prove correctness via an activation-flipping theorem, demonstrate accelerated backward reachability by exploiting connectivity (including a homeomorphism condition), and validate the approach on learned van der Pol oscillator, pendulum, and image-based taxi control, achieving substantial speedups and enabling nonconvex ROAs. They also introduce a practical test for ReLU network invertibility (homeomorphism) and show how RPM can scale to large PWA representations, especially when combined with tools like MPT3. The results highlight the method’s potential to provide exact, constructive safety and stability guarantees for learned control systems in robotics and autonomous platforms.

Abstract

Neural networks are increasingly used in robotics as policies, state transition models, state estimation models, or all of the above. With these components being learned from data, it is important to be able to analyze what behaviors were learned and how this affects closed-loop performance. In this paper we take steps toward this goal by developing methods for computing control invariant sets and regions of attraction (ROAs) of dynamical systems represented as neural networks. We focus our attention on feedforward neural networks with the rectified linear unit (ReLU) activation, which are known to implement continuous piecewise-affine (PWA) functions. We describe the Reachable Polyhedral Marching (RPM) algorithm for enumerating the affine pieces of a neural network through an incremental connected walk. We then use this algorithm to compute exact forward and backward reachable sets, from which we provide methods for computing control invariant sets and ROAs. Our approach is unique in that we find these sets incrementally, without Lyapunov-based tools. In our examples we demonstrate the ability of our approach to find non-convex control invariant sets and ROAs on tasks with learned van der Pol oscillator and pendulum models. Further, we provide an accelerated algorithm for computing ROAs that leverages the incremental and connected enumeration of affine regions that RPM provides. We show this acceleration to lead to a 15x speedup in our examples. Finally, we apply our methods to find a set of states that are stabilized by an image-based controller for an aircraft runway control problem.
Paper Structure (31 sections, 5 theorems, 51 equations, 6 figures, 6 tables, 4 algorithms)

This paper contains 31 sections, 5 theorems, 51 equations, 6 figures, 6 tables, 4 algorithms.

Key Result

Theorem 1

For a given region $P_k$, each neuron defines a hyperplane $\mathbf{a}_{k,ij}^\top \mathbf{x} = b_{k,ij}$, as given by (eq:normalize). Suppose regions $P_k$ and $P_{k'}$ neighbor each other and are separated by $\mathbf{a}_\eta^\top \mathbf{x} = b_\eta$ where $P_{k} \subset \{\mathbf{x} \mid \mathbf

Figures (6)

  • Figure 1: RPM's incremental enumeration of each affine region in a ReLU network. In this case the ReLU network was trained to approximate a quadratic function. Each new affine region enumerated is connected to a previous region.
  • Figure 2: This figure illustrates the main steps of the RPM algorithm (Algorithm \ref{['Algo:cell_enumeration']}). (a) The algorithm is initialized with a point in the input space that is evaluated by the ReLU network to determine the activation pattern $\lambda$. (b) Then the polyhedron corresponding to $\lambda$ is computed using Algorithm \ref{['Algo:ap_to_poly']}. (c) Next, the essential constraints for the polyhedron, which in turn connect it to unexplored neighboring regions (indicated by blue arrows), are determined using Algorithm \ref{['Algo:essential hrep']}. (d) Then the activation pattern for a neighboring region (indicated by the green arrow) is found using Algorithm \ref{['Algo:neighbor_ap']}. The process then repeats by recursively exploring neighboring regions in this manner until the entire input set is explored.
  • Figure 3: 5, 10, 15, 20, 25, and 30-step regions of attraction (ROAs) for the learned van der Pol system. The learned dynamics network is continuously invertible, implying that preimages of connected sets are connected. This property restricts the space explored with RPM, resulting in a 15x speedup in ROA computation. The black line represents the boundary of the true continuous-time maximal ROA.
  • Figure 4: A control invariant set for a learned, torque controlled pendulum system. The domain of the dynamics is $\theta \in (-180\degree, 180\degree)$, $\dot{\theta} \in [-180 \degree/\text{s}, 180 \degree/\text{s}]$, $u \in [-5, 5]$ N-m, where $0\degree$ is upright. Note that we restrict the pendulum so it is not allowed to pass through the down position. When RPM is paired with MPT3 we are able to compute complicated control invariant sets of learned systems, like this unconnected set, as opposed to convex approximations. Within each connected component a sample state is illustrated to help visualize the possible pendulum configurations.
  • Figure 5: The X-Plane 11 flight simulator is used to generate realistic image observations (left) that are then downsampled (right). The downsampled images are used to learn an image-based controller, as well as to learn an observation model from state to image observation. Putting these two components together along with a learned dynamics network results in closed-loop system dynamics represented as a single ReLU network. These images and learned controller/observation models are from verify_gan.
  • ...and 1 more figures

Theorems & Definitions (19)

  • Definition 1: (Convex) Polyhedron
  • Definition 2: Polyhedral Tessellation
  • Definition 3: Neighboring Polyhedra
  • Definition 4: Piecewise Affine (PWA) Function
  • Definition 5: $t$-Step Forward Reachable Set
  • Definition 6: $t$-Step Backward Reachable Set
  • Definition 7: Control Invariant Set
  • Definition 8: (Invariant) Region of Attraction
  • Definition 9: Duplicate Constraints
  • Definition 10: Redundant & Essential Constraints
  • ...and 9 more