Table of Contents
Fetching ...

Welfare Maximization Algorithm for Solving Budget-Constrained Multi-Component POMDPs

Manav Vora, Pranay Thangeda, Michael N. Grussing, Melkior Ornik

TL;DR

The paper addresses budget-constrained decision making for groups of independently evolving components modeled as a multi-component POMDP. It introduces a budgeted-POMDP (b-POMDP) that augments the state with a budget-tracking cost and uses POMCP for online policy synthesis, proving concavity of the value function in the budget for a special case. To handle multiple components, the authors formulate a welfare maximization problem to optimally split the total budget across per-component b-POMDPs, yielding approximately optimal, budget-compliant policies. The approach is demonstrated on infrastructure maintenance and inspection scenarios, where it significantly outperforms existing practice and demonstrates practical feasibility with real data and scalable computation.

Abstract

Partially Observable Markov Decision Processes (POMDPs) provide an efficient way to model real-world sequential decision making processes. Motivated by the problem of maintenance and inspection of a group of infrastructure components with independent dynamics, this paper presents an algorithm to find the optimal policy for a multi-component budget-constrained POMDP. We first introduce a budgeted-POMDP model (b-POMDP) which enables us to find the optimal policy for a POMDP while adhering to budget constraints. Next, we prove that the value function or maximal collected reward for a b-POMDP is a concave function of the budget for the finite horizon case. Our second contribution is an algorithm to calculate the optimal policy for a multi-component budget-constrained POMDP by finding the optimal budget split among the individual component POMDPs. The optimal budget split is posed as a welfare maximization problem and the solution is computed by exploiting the concave nature of the value function. We illustrate the effectiveness of the proposed algorithm by proposing a maintenance and inspection policy for a group of real-world infrastructure components with different deterioration dynamics, inspection and maintenance costs. We show that the proposed algorithm vastly outperforms the policy currently used in practice.

Welfare Maximization Algorithm for Solving Budget-Constrained Multi-Component POMDPs

TL;DR

The paper addresses budget-constrained decision making for groups of independently evolving components modeled as a multi-component POMDP. It introduces a budgeted-POMDP (b-POMDP) that augments the state with a budget-tracking cost and uses POMCP for online policy synthesis, proving concavity of the value function in the budget for a special case. To handle multiple components, the authors formulate a welfare maximization problem to optimally split the total budget across per-component b-POMDPs, yielding approximately optimal, budget-compliant policies. The approach is demonstrated on infrastructure maintenance and inspection scenarios, where it significantly outperforms existing practice and demonstrates practical feasibility with real data and scalable computation.

Abstract

Partially Observable Markov Decision Processes (POMDPs) provide an efficient way to model real-world sequential decision making processes. Motivated by the problem of maintenance and inspection of a group of infrastructure components with independent dynamics, this paper presents an algorithm to find the optimal policy for a multi-component budget-constrained POMDP. We first introduce a budgeted-POMDP model (b-POMDP) which enables us to find the optimal policy for a POMDP while adhering to budget constraints. Next, we prove that the value function or maximal collected reward for a b-POMDP is a concave function of the budget for the finite horizon case. Our second contribution is an algorithm to calculate the optimal policy for a multi-component budget-constrained POMDP by finding the optimal budget split among the individual component POMDPs. The optimal budget split is posed as a welfare maximization problem and the solution is computed by exploiting the concave nature of the value function. We illustrate the effectiveness of the proposed algorithm by proposing a maintenance and inspection policy for a group of real-world infrastructure components with different deterioration dynamics, inspection and maintenance costs. We show that the proposed algorithm vastly outperforms the policy currently used in practice.
Paper Structure (14 sections, 6 theorems, 29 equations, 3 figures, 1 table)

This paper contains 14 sections, 6 theorems, 29 equations, 3 figures, 1 table.

Key Result

Lemma 4.1

For a given budget $b$ and horizon $H$, the value function $V_H(s_0,b)$ is an increasing function of the state $s_0$, i.e., for two states $s_0$ and $s_0^\prime$ such that $s_0 < s_0^\prime$, the following holds:

Figures (3)

  • Figure 1: Comparison of the proposed and baseline approaches using time-to-failure for a range of budget values. (a) Overall results obtained by averaging over all components. (b) Results for the Air Handling Unit component with a replacement cost of 250 units. (c) Results for the Lighting Equipment component with a replacement cost of 24 units.
  • Figure 2: Sample condition index (CI) histories illustrating the performance of the proposed policy when compared to the baseline for the Boiler component with a replacement cost of 45 units, an inspection cost of 1 unit, and a total budget of 500 units. (a) CI history using proposed approach showing failure at 80 time steps. (b) Baseline approach failing at 39 time steps.
  • Figure 3: Comparison of baseline and proposed budget allocation approaches for the all 20 components for an overall budget of 10,000 units.

Theorems & Definitions (11)

  • Lemma 4.1
  • proof
  • Lemma 4.2
  • proof
  • Lemma 4.3
  • proof
  • Lemma 4.4
  • proof
  • Theorem 4.5
  • proof
  • ...and 1 more