Table of Contents
Fetching ...

Multi-Agent Reinforcement Learning with Long-Term Performance Objectives for Service Workforce Optimization

Kareem Eissa, Rayal Prasad, Sarith Mohan, Ankur Kapoor, Dorin Comaniciu, Vivek Singh

TL;DR

This work addresses the challenge of optimizing service-workforce operations across multiple interdependent decisions (dispatch, staffing, and positioning) over long horizons. It introduces a parameterized discrete-event simulator that unifies these tasks and exposes metrics for workforce cost, utilization, and downtime, enabling fair multi-objective optimization via Nash welfare. Through RL (PPO and IMPALA) and heuristic baselines, the study demonstrates that jointly trained agents outperform isolated or heuristic strategies, achieving better trade-offs and adaptability in dynamic, non-stationary environments. The proposed environment offers a scalable platform for advancing holistic workforce management with potential applications in healthcare, facilities services, and field operations, while outlining paths for future enhancements such as routing and hybrid service modes.

Abstract

Workforce optimization plays a crucial role in efficient organizational operations where decision-making may span several different administrative and time scales. For instance, dispatching personnel to immediate service requests while managing talent acquisition with various expertise sets up a highly dynamic optimization problem. Existing work focuses on specific sub-problems such as resource allocation and facility location, which are solved with heuristics like local-search and, more recently, deep reinforcement learning. However, these may not accurately represent real-world scenarios where such sub-problems are not fully independent. Our aim is to fill this gap by creating a simulator that models a unified workforce optimization problem. Specifically, we designed a modular simulator to support the development of reinforcement learning methods for integrated workforce optimization problems. We focus on three interdependent aspects: personnel dispatch, workforce management, and personnel positioning. The simulator provides configurable parameterizations to help explore dynamic scenarios with varying levels of stochasticity and non-stationarity. To facilitate benchmarking and ablation studies, we also include heuristic and RL baselines for the above mentioned aspects.

Multi-Agent Reinforcement Learning with Long-Term Performance Objectives for Service Workforce Optimization

TL;DR

This work addresses the challenge of optimizing service-workforce operations across multiple interdependent decisions (dispatch, staffing, and positioning) over long horizons. It introduces a parameterized discrete-event simulator that unifies these tasks and exposes metrics for workforce cost, utilization, and downtime, enabling fair multi-objective optimization via Nash welfare. Through RL (PPO and IMPALA) and heuristic baselines, the study demonstrates that jointly trained agents outperform isolated or heuristic strategies, achieving better trade-offs and adaptability in dynamic, non-stationary environments. The proposed environment offers a scalable platform for advancing holistic workforce management with potential applications in healthcare, facilities services, and field operations, while outlining paths for future enhancements such as routing and hybrid service modes.

Abstract

Workforce optimization plays a crucial role in efficient organizational operations where decision-making may span several different administrative and time scales. For instance, dispatching personnel to immediate service requests while managing talent acquisition with various expertise sets up a highly dynamic optimization problem. Existing work focuses on specific sub-problems such as resource allocation and facility location, which are solved with heuristics like local-search and, more recently, deep reinforcement learning. However, these may not accurately represent real-world scenarios where such sub-problems are not fully independent. Our aim is to fill this gap by creating a simulator that models a unified workforce optimization problem. Specifically, we designed a modular simulator to support the development of reinforcement learning methods for integrated workforce optimization problems. We focus on three interdependent aspects: personnel dispatch, workforce management, and personnel positioning. The simulator provides configurable parameterizations to help explore dynamic scenarios with varying levels of stochasticity and non-stationarity. To facilitate benchmarking and ablation studies, we also include heuristic and RL baselines for the above mentioned aspects.

Paper Structure

This paper contains 21 sections, 4 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: A simulated run of the workforce management environment. Each time-step has a corresponding environment state showing the workforce ($N_t$ circles), facilities ($M_t$ squares) and assignments (lines). Notice that personnel dispatch agent take frequent actions ($N_t$ x $M_t$ assignments) to dispatch personnel to appropriate facilities. The workforce management agent, on the other hand, often takes no action, and when it does, the impact of the action is observed several time steps later.
  • Figure 2: Process flow for personnel dispatch to service requests generated by facilities.
  • Figure 3: Process flow for workforce management and personnel positioning agents to add or remove personnel. Note that when a new personnel needs to be added, the personnel positioning agent identifies the area where to hire, and when personnel count needs to be reduced, the personnel positioning agent selects the personnel by identifying the area where reduction would help.
  • Figure 4: Network Architecture. Note that the spatial output head does not leverage the workforce features since it only involves a voting mechanism among the facility locations.
  • Figure 5: Results for scenario described in section 4.3. Each row reports on one of the metrics and each column reports on one of the methods. For the middle row, the gray band represents is ideal working conditions with the Personnel Utilization Rate near 0.75 with a margin of 0.05