Multi-Agent Reinforcement Learning with Long-Term Performance Objectives for Service Workforce Optimization
Kareem Eissa, Rayal Prasad, Sarith Mohan, Ankur Kapoor, Dorin Comaniciu, Vivek Singh
TL;DR
This work addresses the challenge of optimizing service-workforce operations across multiple interdependent decisions (dispatch, staffing, and positioning) over long horizons. It introduces a parameterized discrete-event simulator that unifies these tasks and exposes metrics for workforce cost, utilization, and downtime, enabling fair multi-objective optimization via Nash welfare. Through RL (PPO and IMPALA) and heuristic baselines, the study demonstrates that jointly trained agents outperform isolated or heuristic strategies, achieving better trade-offs and adaptability in dynamic, non-stationary environments. The proposed environment offers a scalable platform for advancing holistic workforce management with potential applications in healthcare, facilities services, and field operations, while outlining paths for future enhancements such as routing and hybrid service modes.
Abstract
Workforce optimization plays a crucial role in efficient organizational operations where decision-making may span several different administrative and time scales. For instance, dispatching personnel to immediate service requests while managing talent acquisition with various expertise sets up a highly dynamic optimization problem. Existing work focuses on specific sub-problems such as resource allocation and facility location, which are solved with heuristics like local-search and, more recently, deep reinforcement learning. However, these may not accurately represent real-world scenarios where such sub-problems are not fully independent. Our aim is to fill this gap by creating a simulator that models a unified workforce optimization problem. Specifically, we designed a modular simulator to support the development of reinforcement learning methods for integrated workforce optimization problems. We focus on three interdependent aspects: personnel dispatch, workforce management, and personnel positioning. The simulator provides configurable parameterizations to help explore dynamic scenarios with varying levels of stochasticity and non-stationarity. To facilitate benchmarking and ablation studies, we also include heuristic and RL baselines for the above mentioned aspects.
