Table of Contents
Fetching ...

A Model for Optimal Resilient Planning Subject to Fallible Actuators

Kyle Baldes, Diptanil Chaudhuri, Jason M. O'Kane, Dylan A. Shell

TL;DR

This work introduces a fallible actuator MDP (FA-MDP) to incorporate utilization-driven actuator failures into long-horizon planning. By modeling failure with a reliability function $\rho$ and malfunction transitions $F$, it enables anticipatory policies that preserve critical actuators for future opportunities, rather than exhaustively risking failure and re-planning. The authors propose a lattice-based solver (Lattice Planner) that operates on a DAG of actuator-subset states, leveraging a local Bellman operator with a contraction factor $\gamma\cdot\overline{\rho} < 1$ and hot-starting to accelerate convergence. Empirical results on gridworld tasks show that the lattice approach scales better than a monolithic solver, especially as the number of actuators grows, demonstrating practical resilience benefits for complex robotic systems.

Abstract

Robots incurring component failures ought to adapt their behavior to best realize still-attainable goals under reduced capacity. We formulate the problem of planning with actuators known a priori to be susceptible to failure within the Markov Decision Processes (MDP) framework. The model captures utilization-driven malfunction and state-action dependent likelihoods of actuator failure in order to enable reasoning about potential impairment and the long-term implications of impoverished future control. This leads to behavior differing qualitatively from plans which ignore failure. As actuators malfunction, there are combinatorially many configurations which can arise. We identify opportunities to save computation through re-use, exploiting the observation that differing configurations yield closely related problems. Our results show how strategic solutions are obtained so robots can respond when failures do occur -- for instance, in prudently scheduling utilization in order to keep critical actuators in reserve.

A Model for Optimal Resilient Planning Subject to Fallible Actuators

TL;DR

This work introduces a fallible actuator MDP (FA-MDP) to incorporate utilization-driven actuator failures into long-horizon planning. By modeling failure with a reliability function and malfunction transitions , it enables anticipatory policies that preserve critical actuators for future opportunities, rather than exhaustively risking failure and re-planning. The authors propose a lattice-based solver (Lattice Planner) that operates on a DAG of actuator-subset states, leveraging a local Bellman operator with a contraction factor and hot-starting to accelerate convergence. Empirical results on gridworld tasks show that the lattice approach scales better than a monolithic solver, especially as the number of actuators grows, demonstrating practical resilience benefits for complex robotic systems.

Abstract

Robots incurring component failures ought to adapt their behavior to best realize still-attainable goals under reduced capacity. We formulate the problem of planning with actuators known a priori to be susceptible to failure within the Markov Decision Processes (MDP) framework. The model captures utilization-driven malfunction and state-action dependent likelihoods of actuator failure in order to enable reasoning about potential impairment and the long-term implications of impoverished future control. This leads to behavior differing qualitatively from plans which ignore failure. As actuators malfunction, there are combinatorially many configurations which can arise. We identify opportunities to save computation through re-use, exploiting the observation that differing configurations yield closely related problems. Our results show how strategic solutions are obtained so robots can respond when failures do occur -- for instance, in prudently scheduling utilization in order to keep critical actuators in reserve.
Paper Structure (15 sections, 16 equations, 6 figures)

This paper contains 15 sections, 16 equations, 6 figures.

Figures (6)

  • Figure 1: A robot equipped with wheels and tracks travels from location A to B subject to both motion uncertainty and the possibility of actuator failure.
  • Figure 2: Computed policies for the situation in \ref{['fig:motivation']} for specific reward values. (a) Policy derived from a $6\times6$ gridworld representation of \ref{['fig:motivation']} not accounting for actuator failures. (b) Failure aware policy. (c) Execution of the panglossian policy in (a) that, after failure, results in undesired behavior. At each grid cell, the control with the maximum expected future reward is displayed by indicating the actuator, direction of travel, and the cost-to-go for that control. At the goal state, denoted by G, all controls have the same expected future reward.
  • Figure 3: Value function lattice for a simple example with a set $\boldsymbol{U}\xspace$ comprising three elements. The small grids are a cartoon depiction of a $5\times 5$ state space emphasizing that each element is a value function; e.g., $V_{\{2,3\}}:\!S \to \mathbb{R}$ assigns values to each state for the situation when $1$ has failed.
  • Figure 4: Value Function Operations vs State Backup Ordering: ($\gamma=0.99$, $\varepsilon_{{\rm desired}}=0.001$) Comparison of state backup orderings during asynchronous value iteration. Ordering distributions are represented by the box and whisker plots.
  • Figure 5: Value Function Operations vs Number of Actuators: Scaling the number of actuators from 2 to 12 using the gridworld from \ref{['fig:famdpPlannerTop']}.
  • ...and 1 more figures

Theorems & Definitions (2)

  • proof
  • proof