Learning State-Dependent Policy Parametrizations for Dynamic Technician Routing with Rework

Jonas Stein; Florentin D Hildebrandt; Barrett W Thomas; Marlin W Ulmer

Learning State-Dependent Policy Parametrizations for Dynamic Technician Routing with Rework

Jonas Stein, Florentin D Hildebrandt, Barrett W Thomas, Marlin W Ulmer

TL;DR

The paper tackles dynamic technician routing under heterogeneous skills and stochastic absences, modeling rework risk and deadlines as a sequential decision problem. It introduces a score-based assignment and routing policy whose core is a state-dependent parameter $\alpha_{S_t}$ learned via PPO-based reinforcement learning, balancing service urgency, routing efficiency, and safety. Computational results show that a dynamically tuned balance policy (DB) outperforms static balance and several myopic benchmarks, reducing average customer inconvenience and rework while using fewer resources; roughly a subset of risky assignments (about 7%) can yield net gains. The findings highlight the value of state-aware prioritization in dynamic service routing and offer robust guidance for applying state-dependent RL tuning to related domains with operational uncertainty.

Abstract

Home repair and installation services require technicians to visit customers and resolve tasks of different complexity. Technicians often have heterogeneous skills and working experiences. The geographical spread of customers makes achieving only perfect matches between technician skills and task requirements impractical. Additionally, technicians are regularly absent due to sickness. With non-perfect assignments regarding task requirement and technician skill, some tasks may remain unresolved and require a revisit and rework. Companies seek to minimize customer inconvenience due to delay. We model the problem as a sequential decision process where, over a number of service days, customers request service while heterogeneously skilled technicians are routed to serve customers in the system. Each day, our policy iteratively builds tours by adding "important" customers. The importance bases on analytical considerations and is measured by respecting routing efficiency, urgency of service, and risk of rework in an integrated fashion. We propose a state-dependent balance of these factors via reinforcement learning. A comprehensive study shows that taking a few non-perfect assignments can be quite beneficial for the overall service quality. We further demonstrate the value provided by a state-dependent parametrization.

Learning State-Dependent Policy Parametrizations for Dynamic Technician Routing with Rework

TL;DR

learned via PPO-based reinforcement learning, balancing service urgency, routing efficiency, and safety. Computational results show that a dynamically tuned balance policy (DB) outperforms static balance and several myopic benchmarks, reducing average customer inconvenience and rework while using fewer resources; roughly a subset of risky assignments (about 7%) can yield net gains. The findings highlight the value of state-aware prioritization in dynamic service routing and offer robust guidance for applying state-dependent RL tuning to related domains with operational uncertainty.

Abstract

Paper Structure (44 sections, 2 theorems, 16 equations, 16 figures, 3 tables, 2 algorithms)

This paper contains 44 sections, 2 theorems, 16 equations, 16 figures, 3 tables, 2 algorithms.

Introduction
Problem.
Methodology.
Related Literature
Problem Definition
Formal Problem Description
Example
Sequential Decision Process
Preliminaries.
Decision State.
Decisions.
Stochastic Information and Transition Function.
Solution.
Methodology
Motivation and Overview
...and 29 more sections

Key Result

Proposition 4.1

Given a post-decision state $S_t^x$ with deadlines $\delta^x_t$, we construct $S_t^{x\prime}$ such that $S_t^x$ and $S_t^{x\prime}$ are identical except for their corresponding customer deadlines, i.e., $\delta^x_{it}\leq\delta_{it}^{x\prime} \space \forall i \in (\mathcal{K}_t^{xu} \cup \mathcal{K}

Figures (16)

Figure 1: Decisions and resulting next states; I (top) and II (bottom)
Figure 2: Illustration of the three competing goals and our policy's functionality
Figure 3: Average inconvenience and average delay per customer
Figure 4: Learning curves for five algorithmic configurations (see Table \ref{['tab: results_app']})
Figure 5: Frequencies of unresolved services
...and 11 more figures

Theorems & Definitions (2)

Proposition 4.1: Monotonicity of the value function in $\boldsymbol{\delta_t}$
Corollary 4.1

Learning State-Dependent Policy Parametrizations for Dynamic Technician Routing with Rework

TL;DR

Abstract

Learning State-Dependent Policy Parametrizations for Dynamic Technician Routing with Rework

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (2)