Table of Contents
Fetching ...

On Sequential Fault-Intolerant Process Planning

Andrzej Kaczmarczyk, Davin Choo, Niclas Boehmer, Milind Tambe, Haifeng Xu

TL;DR

SFIPP addresses fault-intolerant sequential decision-making with unknown stage-wise success probabilities, where the round reward is the product $\prod_{s=1}^m p_{s,i_s}$ and learning occurs over horizon $T$. The authors introduce a staged-bandit framework that yields provably tight regret bounds in both deterministic and probabilistic settings, and show how collapsing stages of the same type can further reduce regret. They propose and analyze specialized algorithms (UTF, SB, SCB, SCFGB) that exploit problem structure, and demonstrate through extensive experiments that these tailored methods outperform a generic bandit approach, with collapses offering clear gains when stage types are correctly identified. The work advances fault-tolerant, multi-stage planning under uncertainty and has potential impact on domains such as drug discovery, security, and quality-critical design.

Abstract

We propose and study a planning problem we call Sequential Fault-Intolerant Process Planning (SFIPP). SFIPP captures a reward structure common in many sequential multi-stage decision problems where the planning is deemed successful only if all stages succeed. Such reward structures are different from classic additive reward structures and arise in important applications such as drug/material discovery, security, and quality-critical product design. We design provably tight online algorithms for settings in which we need to pick between different actions with unknown success chances at each stage. We do so both for the foundational case in which the behavior of actions is deterministic, and the case of probabilistic action outcomes, where we effectively balance exploration for learning and exploitation for planning through the usage of multi-armed bandit algorithms. In our empirical evaluations, we demonstrate that the specialized algorithms we develop, which leverage additional information about the structure of the SFIPP instance, outperform our more general algorithm.

On Sequential Fault-Intolerant Process Planning

TL;DR

SFIPP addresses fault-intolerant sequential decision-making with unknown stage-wise success probabilities, where the round reward is the product and learning occurs over horizon . The authors introduce a staged-bandit framework that yields provably tight regret bounds in both deterministic and probabilistic settings, and show how collapsing stages of the same type can further reduce regret. They propose and analyze specialized algorithms (UTF, SB, SCB, SCFGB) that exploit problem structure, and demonstrate through extensive experiments that these tailored methods outperform a generic bandit approach, with collapses offering clear gains when stage types are correctly identified. The work advances fault-tolerant, multi-stage planning under uncertainty and has potential impact on domains such as drug discovery, security, and quality-critical design.

Abstract

We propose and study a planning problem we call Sequential Fault-Intolerant Process Planning (SFIPP). SFIPP captures a reward structure common in many sequential multi-stage decision problems where the planning is deemed successful only if all stages succeed. Such reward structures are different from classic additive reward structures and arise in important applications such as drug/material discovery, security, and quality-critical product design. We design provably tight online algorithms for settings in which we need to pick between different actions with unknown success chances at each stage. We do so both for the foundational case in which the behavior of actions is deterministic, and the case of probabilistic action outcomes, where we effectively balance exploration for learning and exploitation for planning through the usage of multi-armed bandit algorithms. In our empirical evaluations, we demonstrate that the specialized algorithms we develop, which leverage additional information about the structure of the SFIPP instance, outperform our more general algorithm.

Paper Structure

This paper contains 19 sections, 13 theorems, 27 equations, 8 figures, 4 algorithms.

Key Result

Theorem 2

Consider the SFIPP problem with deterministic success/failure where $z = |\{(s,i) \in [m] \times [k]: p_{s,i} = 0\}|$ denotes the number of zero entries in $\mathbf{P}$. Then, there is a randomized algorithm achieving expected regret (eq:SFIPP-benchmark-objective) of at most $\frac{z}{2}$. Furthermo

Figures (8)

  • Figure 1: All experimental plots for \ref{['sec:experiments']}. We provide error bar versions of them in \ref{['sec:appendix-plots']}.
  • Figure 2: Experimental results on deterministic processes. The bands around the curves represent standard deviations.
  • Figure 3: Experimental results on probabilistic processes with one stage type, i.e. all stages have the same unknown optimal action index, but differing number of stages. Observe that as $m$ grows, the difference in accumulated regret widens between knowing whether to collapse all stages into a single "meta" stage or not. The bands around the curves represent standard deviations.
  • Figure 4: Experimental results on probabilistic processes with one stage type, i.e. all stages have the same unknown optimal action index. $\mathbf{P}$ entries are generated from $\mathrm{Beta}(\alpha = 10, \beta = 1)$. Observe that having the information to collapse into one (blue) or two (orange) stages leads to significant improvements in the accumulated regret. The bands around the curves represent standard deviations.
  • Figure 5: Experimental results on probabilistic processes with one stage type, i.e. all stages have the same unknown optimal action index. $\mathbf{P}$ entries are generated from $\mathrm{Beta}(\alpha = 1, \beta = 1)$. Observe that having the information to collapse into one (blue) or two (orange) stages leads to significant improvements in the accumulated regret. The bands around the curves represent standard deviations.
  • ...and 3 more figures

Theorems & Definitions (28)

  • Example 1
  • Theorem 2
  • Lemma 2
  • Lemma 2
  • proof : Proof Sketch
  • Lemma 2
  • proof : Proof sketch
  • proof : Proof of \ref{['thm:deterministic-rand-alg']}
  • Theorem 3
  • Lemma 3
  • ...and 18 more