Repeatable and Reliable Efforts of Accelerated Risk Assessment in Robot Testing

Linda Capito; Guillermo A. Castillo; Bowen Weng

Repeatable and Reliable Efforts of Accelerated Risk Assessment in Robot Testing

Linda Capito, Guillermo A. Castillo, Bowen Weng

TL;DR

Risk assessment of robots in controlled environments requires repeatability across trials and reliability across diverse subjects. The paper formalizes $β$-repeatability and $γ$-reliability for sampling-based IS risk estimation and proposes a provably repeatable, reliable accelerated testing algorithm with a main procedure and a theoretical guarantee linked to the KL divergence between the nominal and importance distributions. A reproducible statistical query adaptation is incorporated to ensure stable outputs, while the sample-size bounds tie testing effort to distributional divergence and tail behavior. Empirical demonstrations on an inverted pendulum and a Rabbit legged robot pushover show near-perfect repeatability and reliability, outperforming RHW-based termination that yields non-repeatable results. The work enables standardized, fair, and efficient robot risk testing across vendors with provable guarantees and has potential to extend to other performance measures beyond risk.

Abstract

Risk assessment of a robot in controlled environments, such as laboratories and proving grounds, is a common means to assess, certify, validate, verify, and characterize the robots' safety performance before, during, and even after their commercialization in the real-world. A standard testing program that acquires the risk estimate is expected to be (i) repeatable, such that it obtains similar risk assessments of the same testing subject among multiple trials or attempts with the similar testing effort by different stakeholders, and (ii) reliable against a variety of testing subjects produced by different vendors and manufacturers. Both repeatability and reliability are fundamental and crucial for a testing algorithm's validity, fairness, and practical feasibility, especially for standardization. However, these properties are rarely satisfied or ensured, especially as the subject robots become more complex, uncertain, and varied. This issue was present in traditional risk assessments through Monte-Carlo sampling, and remains a bottleneck for the recent accelerated risk assessment methods, primarily those using importance sampling. This study aims to enhance existing accelerated testing frameworks by proposing a new algorithm that provably integrates repeatability and reliability with the already established formality and efficiency. It also features demonstrations assessing the risk of instability from frontal impacts, initiated by push-over disturbances on a controlled inverted pendulum and a 7-DoF planar bipedal robot Rabbit managed by various control algorithms.

Repeatable and Reliable Efforts of Accelerated Risk Assessment in Robot Testing

TL;DR

Risk assessment of robots in controlled environments requires repeatability across trials and reliability across diverse subjects. The paper formalizes

-repeatability and

-reliability for sampling-based IS risk estimation and proposes a provably repeatable, reliable accelerated testing algorithm with a main procedure and a theoretical guarantee linked to the KL divergence between the nominal and importance distributions. A reproducible statistical query adaptation is incorporated to ensure stable outputs, while the sample-size bounds tie testing effort to distributional divergence and tail behavior. Empirical demonstrations on an inverted pendulum and a Rabbit legged robot pushover show near-perfect repeatability and reliability, outperforming RHW-based termination that yields non-repeatable results. The work enables standardized, fair, and efficient robot risk testing across vendors with provable guarantees and has potential to extend to other performance measures beyond risk.

Abstract

Paper Structure (16 sections, 1 theorem, 6 equations, 5 figures, 3 algorithms)

This paper contains 16 sections, 1 theorem, 6 equations, 5 figures, 3 algorithms.

Introduction
Accelerated testing for risk estimate
Main Contributions
Preliminaries and Problem Formulation
The testing system
Sampling-based risk estimate
Repeatable & reliable risk estimate
A direct adaptation of reproducible statistical query
Main Proposal
The main algorithm, theorem, and proof
Discussions
Experiments
The inverted pendulum pushover
The legged-robot pushover
Conclusion
...and 1 more sections

Key Result

theorem thmcountertheorem

Given $\bar{\tau}\in (0,1]$, Algorithm alg:r2_tight is $\beta$-repeatable for $\beta \in (0,1)$ satisfying $|\widehat{\mathcal{TE}}_{IS}(\cdot)-r^*|\leq \tau\in \mathbb{R}_{>0}$ except with probability $2e^{-2n\tau^2}, n=e^{D(p \mid\mid q)+c}$ for some $c\in\mathbb{R}_{\geq0}$. Moreover, for all imp and some $c \in [0, -D(p \mid\mid q) + \log\gamma]$, it is also $\gamma$-reliable.

Figures (5)

Figure 1: The repeatability and reliability issues of NADE, an importance sampling inspired ADS risk assessment algorithm feng2021intelligent, revealed through an extended re-implementation of the open-source code nadegithub. Overall, three different ADS algorithms are tested, including AV2 (a default subject from the original proposal feng2021intelligent) and two customized algorithms referred to as AV3 and AV4. The hyper-parameter $s_r$ is a threshold value for the relative half-width of the risk estimate, which is closely related to a widely adopted empirical termination condition for importance-sampling based testing algorithms (see Section \ref{['sec:prob']} for more details).
Figure 2: The inverted pendulum push-over task (left) and the legged robot pushover task with Rabbit castillo2019reinforcementgong2021one (right). Three controllers are tested for the inverted pendulum push-over task. Two locomotion controllers are evaluated for the Rabbit case.
Figure 3: Repeatability and reliability comparison between Algorithm \ref{['alg:r2_tight']} and Algorithm \ref{['alg:is_testing']} with RHW-based termination condition through the inverted-pendulum push-over task against 3 different controllers (LQR, NMPC, and PID). Within each sub-figure, Algorithm \ref{['alg:r2_tight']} is configured using identical specifications with $\beta=0.4, \bar{t}=0.3, \tau=0.1$, which leads to 145800 samples and a failure probability ($\epsilon$) that is extremely close to zero. It is repeated for 100 times against each controller. The obtained risk estimates are identical shown as the star in each sub-figure. The dots in all figures represent another 100 attempts of Algorithm \ref{['alg:is_testing']} with the RHW threshold of $s_r=0.001$ as the termination criterion. The $r^*$ for each controller is obtained through the approximated enumeration of all samples in the discretized sample space $V_d$ with sufficiently small resolution ($0.002$ m/s).
Figure 4: Comparing the main proposal (Algorithm \ref{['alg:r2_tight']}) and the described procedure in Section \ref{['sec:prob:direct']} w.r.t. a variety of hyper-parameters. Note the $\gamma$ values are shown in the $\log$ scale.
Figure 5: Repeatability and reliability comparison between Algorithm \ref{['alg:r2_tight']} and Algorithm \ref{['alg:is_testing']} with RHW-based termination condition for the Rabbit push-over task against two different controllers (RL and ALIP). Within each sub-figure, Algorithm \ref{['alg:r2_tight']} is configured using identical specifications with $\beta=0.4, \bar{t}=0.35, \tau=0.2$, which leads to 10939 samples and a failure probability ($\epsilon$) that is extremely close to zero. It is repeated 35 times against each controller. The obtained risk estimates are identical, shown as the star in each sub-figure. The dots in all figures represent another 35 attempts of Algorithm \ref{['alg:is_testing']} with the RHW threshold of $s_r=0.13$ as the termination criterion. Note $r^*$ is not presented in this study as there is no trackable solution to obtain the ground-truth risk estimate for this Rabbit push-over case.

Theorems & Definitions (7)

remark thmcounterremark
definition thmcounterdefinition
definition thmcounterdefinition
theorem thmcountertheorem
proof
remark thmcounterremark
remark thmcounterremark

Repeatable and Reliable Efforts of Accelerated Risk Assessment in Robot Testing

TL;DR

Abstract

Repeatable and Reliable Efforts of Accelerated Risk Assessment in Robot Testing

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (7)