Table of Contents
Fetching ...

Guarantees on Robot System Performance Using Stochastic Simulation Rollouts

Joseph A. Vincent, Aaron O. Feldman, Mac Schwager

TL;DR

Finite-sample performance guarantees for control policies executed on stochastic robotic systems are provided and the expected value, value at risk, and conditional value at risk of the trajectory cost, and the probability of failure in a sparse cost setting are bound.

Abstract

We provide finite-sample performance guarantees for control policies executed on stochastic robotic systems. Given an open- or closed-loop policy and a finite set of trajectory rollouts under the policy, we bound the expected value, value-at-risk, and conditional-value-at-risk of the trajectory cost, and the probability of failure in a sparse cost setting. The bounds hold, with user-specified probability, for any policy synthesis technique and can be seen as a post-design safety certification. Generating the bounds only requires sampling simulation rollouts, without assumptions on the distribution or complexity of the underlying stochastic system. We adapt these bounds to also give a constraint satisfaction test to verify safety of the robot system. We provide a thorough analysis of the bound sensitivity to sim-to-real distribution shifts and provide results for constructing robust bounds that can tolerate some specified amount of distribution shift. Furthermore, we extend our method to apply when selecting the best policy from a set of candidates, requiring a multi-hypothesis correction. We show the statistical validity of our bounds in the Ant, Half-cheetah, and Swimmer MuJoCo environments and demonstrate our constraint satisfaction test with the Ant. Finally, using the 20 degree-of-freedom MuJoCo Shadow Hand, we show the necessity of the multi-hypothesis correction.

Guarantees on Robot System Performance Using Stochastic Simulation Rollouts

TL;DR

Finite-sample performance guarantees for control policies executed on stochastic robotic systems are provided and the expected value, value at risk, and conditional value at risk of the trajectory cost, and the probability of failure in a sparse cost setting are bound.

Abstract

We provide finite-sample performance guarantees for control policies executed on stochastic robotic systems. Given an open- or closed-loop policy and a finite set of trajectory rollouts under the policy, we bound the expected value, value-at-risk, and conditional-value-at-risk of the trajectory cost, and the probability of failure in a sparse cost setting. The bounds hold, with user-specified probability, for any policy synthesis technique and can be seen as a post-design safety certification. Generating the bounds only requires sampling simulation rollouts, without assumptions on the distribution or complexity of the underlying stochastic system. We adapt these bounds to also give a constraint satisfaction test to verify safety of the robot system. We provide a thorough analysis of the bound sensitivity to sim-to-real distribution shifts and provide results for constructing robust bounds that can tolerate some specified amount of distribution shift. Furthermore, we extend our method to apply when selecting the best policy from a set of candidates, requiring a multi-hypothesis correction. We show the statistical validity of our bounds in the Ant, Half-cheetah, and Swimmer MuJoCo environments and demonstrate our constraint satisfaction test with the Ant. Finally, using the 20 degree-of-freedom MuJoCo Shadow Hand, we show the necessity of the multi-hypothesis correction.
Paper Structure (33 sections, 15 theorems, 95 equations, 9 figures)

This paper contains 33 sections, 15 theorems, 95 equations, 9 figures.

Key Result

Theorem 1

Consider $\tau, \delta \in (0,1)$ and $n$ IID cost samples $J_{1:n}$, and let $k$ be the smallest index such that $\textup{Bin}(k-1;n,\tau) \ge 1 - \delta$. We have the following probabilistic upper bound on $\textup{VaR}_\tau(J)$, which has the property A feasible value for $k$ exists when $n \geq \lceil \ln(\delta) / \ln(\tau) \rceil$, i.e., $n$ is large enough to ensure $\textup{Bin}(n-1;n,\t

Figures (9)

  • Figure 1: Overview of our method for bounding performance for a single policy using a stochastic simulator. The policy is executed in simulation $n$ times to collect trajectory rollouts. The cost or constraint function is evaluated for each rollout and these samples are used to form a distribution-free upper bound on a given performance measure (expected value, value-at-risk, conditional-value-at-risk, or probability of failure) that is guaranteed to hold with probability at least $1-\delta$. These probabilistic bounds may also be used to ensure safety by testing constraint satisfaction for the performance measures. The finite-sample bound guarantee ensures that these tests incorrectly accept a policy as safe with at most $\delta$ probability. We demonstrate this pipeline for several MuJoCo environments and extend the method to compare multiple policies for manipulating an egg of uncertain mass and friction.
  • Figure 2: Visualization of the expected value, $\textup{VaR}_\tau$, and $\textup{CVaR}_\tau$ for an example distribution. Classic stochastic optimal control and reinforcement learning both seek to minimize the expected value of the total cost distribution. Risk-sensitive stochastic optimal control and reinforcement learning consider other measures of performance, such as VaR or CVaR of the cost.
  • Figure 3: Empirical validation of the bounds for $\textup{VaR}_\tau$, $\mathbb{E}$, $\textup{CVaR}_\tau$, and $q$ (subfigures \ref{['fig: var_bound']}, \ref{['fig: exp_bound']}, \ref{['fig: cvar_bound']}, \ref{['fig: pr_bound']}, respectively). Each plot shows, for a single policy, the empirical distribution of total cost $J$ (blue), along with the distribution of the bound (gray). The blue vertical line shows the true measure we seek to bound and the gray vertical line shows the $\delta$ quantile of the bound distribution. Since our theoretical results ensure the bounds holds with probability $\ge1 - \delta$, the $\delta$ quantile of the bound distribution should exceed the true measure. Thus visually, our results ensure the gray line is to the right of the blue line, as validated in each plot. The cost histogram was generated using $10,000$ simulations and the bound histogram was generated by repeatedly computing the bound $1000$ separate times. In each case $n=100$, $\delta = 0.2$, and $\tau = 0.7$. To demonstrate that the bounds are agnostic to the dynamics, we used the Half Cheetah (\ref{['fig: var_bound']}), Ant (\ref{['fig: exp_bound']}, \ref{['fig: pr_bound']}), and Swimmer (\ref{['fig: cvar_bound']}) MuJoCo environments.
  • Figure 4: Visualization of Theorem \ref{['Thm:ConstraintTest']}, and empirical validation of the theorem, applied to testing whether a chance constraint holds. Each curve represents the probability of accepting that the chance constraint holds (y-axis) given the true probability of the underlying trajectory constraint being satisfied (x-axis). The validity of the theorem is demonstrated by each curve being below $\delta$ when the chance constraint fails to hold i.e., $\Pr[\textup{constraint satisfied}]$ is below $\tau$. Visually, the false acceptance is guaranteed to be below $\delta$ so that the curves avoid the region shaded in red in the figure. Here we use $\delta=0.2$ (horizontal line), $\tau=0.7$ (vertical line). Furthermore, as the sample size ($n$) increases, the curve approaches a step function, i.e. we obtain a perfect discriminator. In addition, for $n=10$ we plot empirical results from the Ant environment where the vertical position of the Ant torso always being between $[0.5, 1]$ with probability $0.7$ is the chance constraint we seek to assess. The $x$ and $y$ coordinates for each orange dot are separately estimated using an average taken over 1000 simulation runs.
  • Figure 5: Confidence level of the VaR bound as a sim-to-real mismatch is varied. The parameter $\sigma$ controls the standard deviation of the initial state distribution in the Half Cheetah environment. When $\sigma > \sigma_{sim}$, the true confidence level of the bound degrades. When $\sigma < \sigma_{sim}$, the true confidence level of the bound strengthens. In blue we plot the empirical confidence levels estimated by varying $\sigma$, using $10,000$ simulations to estimate $\textup{VaR}_{\tau}(J_{true})$, and using $1000$ realizations of $\overline{\textup{VaR}}_{\tau}$. In orange we plot the minimum confidence level guaranteed by Equation \ref{['eq:var_sensitivty']}. The theoretical sensitivity guarantee is valid, always lower than the empirical confidence, but is pessimistic when $\sigma < \sigma_{sim}$ as even though the one-sided KS distance $\alpha > 0$ the distribution shift actually results in a higher confidence level.
  • ...and 4 more figures

Theorems & Definitions (29)

  • Definition 1: Value-at-Risk
  • Definition 2: Expected Value
  • Definition 3: Conditional Value-at-Risk
  • Definition 4: Failure Probability
  • Definition 5: Order Statistics
  • Definition 6: Binomial Distribution
  • Theorem 1: VaR Bound
  • Definition 7: DKW Gap
  • Theorem 2: Expected Value Bound
  • Theorem 3: CVaR Bound
  • ...and 19 more