Table of Contents
Fetching ...

Scalable Policies for the Dynamic Traveling Multi-Maintainer Problem with Alerts

Peter Verleijsdonk, Willem van Jaarsveld, Stella Kapodistria

TL;DR

This work studies the dynamic traveling multi-maintainer problem with alerts under perfect condition information with the objective to devise scalable solution approaches to maintain large networks with $K$-DTMPA instances and demonstrates that DRL can solve single maintainer instances up to optimality, regardless of the chosen initial solution.

Abstract

Downtime of industrial assets such as wind turbines and medical imaging devices is costly. To avoid such downtime costs, companies seek to initiate maintenance just before failure, which is challenging because: (i) Asset failures are notoriously difficult to predict, even in the presence of real-time monitoring devices which signal degradation; and (ii) Limited resources are available to serve a network of geographically dispersed assets. In this work, we study the dynamic traveling multi-maintainer problem with alerts ($K$-DTMPA) under perfect condition information with the objective to devise scalable solution approaches to maintain large networks with $K$ maintenance engineers. Since such large-scale $K$-DTMPA instances are computationally intractable, we propose an iterative deep reinforcement learning (DRL) algorithm optimizing long-term discounted maintenance costs. The efficiency of the DRL approach is vastly improved by a reformulation of the action space (which relies on the Markov structure of the underlying problem) and by choosing a smart, suitable initial solution. The initial solution is created by extending existing heuristics with a dispatching mechanism. These extensions further serve as compelling benchmarks for tailored instances. We demonstrate through extensive numerical experiments that DRL can solve single maintainer instances up to optimality, regardless of the chosen initial solution. Experiments with hospital networks containing up to $35$ assets show that the proposed DRL algorithm is scalable. Lastly, the trained policies are shown to be robust against network modifications such as removing an asset or an engineer or yield a suitable initial solution for the DRL approach.

Scalable Policies for the Dynamic Traveling Multi-Maintainer Problem with Alerts

TL;DR

This work studies the dynamic traveling multi-maintainer problem with alerts under perfect condition information with the objective to devise scalable solution approaches to maintain large networks with -DTMPA instances and demonstrates that DRL can solve single maintainer instances up to optimality, regardless of the chosen initial solution.

Abstract

Downtime of industrial assets such as wind turbines and medical imaging devices is costly. To avoid such downtime costs, companies seek to initiate maintenance just before failure, which is challenging because: (i) Asset failures are notoriously difficult to predict, even in the presence of real-time monitoring devices which signal degradation; and (ii) Limited resources are available to serve a network of geographically dispersed assets. In this work, we study the dynamic traveling multi-maintainer problem with alerts (-DTMPA) under perfect condition information with the objective to devise scalable solution approaches to maintain large networks with maintenance engineers. Since such large-scale -DTMPA instances are computationally intractable, we propose an iterative deep reinforcement learning (DRL) algorithm optimizing long-term discounted maintenance costs. The efficiency of the DRL approach is vastly improved by a reformulation of the action space (which relies on the Markov structure of the underlying problem) and by choosing a smart, suitable initial solution. The initial solution is created by extending existing heuristics with a dispatching mechanism. These extensions further serve as compelling benchmarks for tailored instances. We demonstrate through extensive numerical experiments that DRL can solve single maintainer instances up to optimality, regardless of the chosen initial solution. Experiments with hospital networks containing up to assets show that the proposed DRL algorithm is scalable. Lastly, the trained policies are shown to be robust against network modifications such as removing an asset or an engineer or yield a suitable initial solution for the DRL approach.
Paper Structure (36 sections, 13 equations, 9 figures, 15 tables)

This paper contains 36 sections, 13 equations, 9 figures, 15 tables.

Figures (9)

  • Figure 1: (Figure best viewed in color.) Visualization of the $K$-DTMPA model for an asset network of $M=8$ machines serviced by $K=3$ maintenance engineer. Blue dots on top of machine nodes indicate that the machine is healthy, orange when alerted or red when the machine is down. The engineers are colored cyan, green and purple and are located at Amsterdam, Maastricht and Utrecht, respectively. At discrete decision epochs, engineers can either: (i) idle/continue, (ii) travel to another location or (iii) start maintenance at the current location.
  • Figure 2: Visualization and description of the three envisioned policy aspects.
  • Figure 3: Visualization of the simulation-based policy $\hat{\pi}^+$. In state $h^{a_{k-1}}$, the policy prescribes to follow the action $\tilde{a} \in \mathcal{U}_k(h^{a_{k-1}})$ (recall that $\mathcal{U}_k(h^{a_{k-1}}) \subseteq \{u_{m}\}_{m=1}^M \cup \{v\})$ that minimizes the average undiscounted trajectory cost $\hat{q}_{\pi_0}(h^{ a_{k-1}}, \tilde{a})$. Unbiased estimates of the action-value function are computed from $r$ independent roll-out simulations whose length follows a Geometric distribution with parameter $1-\gamma$.
  • Figure 4: (Figure best viewed in color.) The Dutch academic hospitals with the corresponding travel time matrix $\Theta$ in quarters. The engineers are colored cyan, green and purple and are located in Amsterdam, Maastricht and Rotterdam, respectively. For the decomposition heuristic, appropriate clusters are constructed using $K$-means clustering; locations within the respective clusters of engineers are colored accordingly.
  • Figure 5: The subset of Dutch city hospitals with the corresponding degradation matrices $\tilde{\textrm{Q}}$3 and $\tilde{\textrm{Q}}$4. The color-coding is as in Figure \ref{['fig:academic_hospitals']}, the additional engineers are colored brown and yellow and are located in Arnhem and Groningen, respectively. For the decomposition heuristic, appropriate clusters are constructed using $K$-means clustering; locations within the respective clusters of engineers are colored accordingly.
  • ...and 4 more figures