A Model for Optimal Resilient Planning Subject to Fallible Actuators

Kyle Baldes; Diptanil Chaudhuri; Jason M. O'Kane; Dylan A. Shell

A Model for Optimal Resilient Planning Subject to Fallible Actuators

Kyle Baldes, Diptanil Chaudhuri, Jason M. O'Kane, Dylan A. Shell

TL;DR

This work introduces a fallible actuator MDP (FA-MDP) to incorporate utilization-driven actuator failures into long-horizon planning. By modeling failure with a reliability function $\rho$ and malfunction transitions $F$, it enables anticipatory policies that preserve critical actuators for future opportunities, rather than exhaustively risking failure and re-planning. The authors propose a lattice-based solver (Lattice Planner) that operates on a DAG of actuator-subset states, leveraging a local Bellman operator with a contraction factor $\gamma\cdot\overline{\rho} < 1$ and hot-starting to accelerate convergence. Empirical results on gridworld tasks show that the lattice approach scales better than a monolithic solver, especially as the number of actuators grows, demonstrating practical resilience benefits for complex robotic systems.

Abstract

Robots incurring component failures ought to adapt their behavior to best realize still-attainable goals under reduced capacity. We formulate the problem of planning with actuators known a priori to be susceptible to failure within the Markov Decision Processes (MDP) framework. The model captures utilization-driven malfunction and state-action dependent likelihoods of actuator failure in order to enable reasoning about potential impairment and the long-term implications of impoverished future control. This leads to behavior differing qualitatively from plans which ignore failure. As actuators malfunction, there are combinatorially many configurations which can arise. We identify opportunities to save computation through re-use, exploiting the observation that differing configurations yield closely related problems. Our results show how strategic solutions are obtained so robots can respond when failures do occur -- for instance, in prudently scheduling utilization in order to keep critical actuators in reserve.

A Model for Optimal Resilient Planning Subject to Fallible Actuators

TL;DR

This work introduces a fallible actuator MDP (FA-MDP) to incorporate utilization-driven actuator failures into long-horizon planning. By modeling failure with a reliability function

and malfunction transitions

, it enables anticipatory policies that preserve critical actuators for future opportunities, rather than exhaustively risking failure and re-planning. The authors propose a lattice-based solver (Lattice Planner) that operates on a DAG of actuator-subset states, leveraging a local Bellman operator with a contraction factor

and hot-starting to accelerate convergence. Empirical results on gridworld tasks show that the lattice approach scales better than a monolithic solver, especially as the number of actuators grows, demonstrating practical resilience benefits for complex robotic systems.

Abstract

Paper Structure (15 sections, 16 equations, 6 figures)

This paper contains 15 sections, 16 equations, 6 figures.

Introduction: Motivation and related work
Motivating scenario
Related work
Notation, preliminaries, and assumptions
Notation
Preliminaries
Assumptions
Failure model
Exploiting structure
Solving FA-MDP Problems
Experiments
Methodology
Asynchronous Value Iteration State Backup Ordering
Scaling
Conclusion

Figures (6)

Figure 1: A robot equipped with wheels and tracks travels from location A to B subject to both motion uncertainty and the possibility of actuator failure.
Figure 2: Computed policies for the situation in \ref{['fig:motivation']} for specific reward values. (a) Policy derived from a $6\times6$ gridworld representation of \ref{['fig:motivation']} not accounting for actuator failures. (b) Failure aware policy. (c) Execution of the panglossian policy in (a) that, after failure, results in undesired behavior. At each grid cell, the control with the maximum expected future reward is displayed by indicating the actuator, direction of travel, and the cost-to-go for that control. At the goal state, denoted by G, all controls have the same expected future reward.
Figure 3: Value function lattice for a simple example with a set $\boldsymbol{U}\xspace$ comprising three elements. The small grids are a cartoon depiction of a $5\times 5$ state space emphasizing that each element is a value function; e.g., $V_{\{2,3\}}:\!S \to \mathbb{R}$ assigns values to each state for the situation when $1$ has failed.
Figure 4: Value Function Operations vs State Backup Ordering: ($\gamma=0.99$, $\varepsilon_{{\rm desired}}=0.001$) Comparison of state backup orderings during asynchronous value iteration. Ordering distributions are represented by the box and whisker plots.
Figure 5: Value Function Operations vs Number of Actuators: Scaling the number of actuators from 2 to 12 using the gridworld from \ref{['fig:famdpPlannerTop']}.
...and 1 more figures

Theorems & Definitions (2)

proof
proof

A Model for Optimal Resilient Planning Subject to Fallible Actuators

TL;DR

Abstract

A Model for Optimal Resilient Planning Subject to Fallible Actuators

Authors

TL;DR

Abstract

Table of Contents

Figures (6)

Theorems & Definitions (2)