Stochastic Dynamic Network Utility Maximization with Application to Disaster Response
Anna Scaglione, Nurullah Karakoc
TL;DR
The paper addresses resource allocation under stochastic dynamics in large-scale disaster response by formulating a stochastic dynamic NUM problem over multiple local MDPs tied by a global resource cap. It solves this via a distributed primal-dual approach: local subproblems F_l(y^l) are solved with deep reinforcement learning on agent-based simulations, while a central layer uses dual prices to coordinate allocations. To enable online tractability, the authors introduce a concave, non-decreasing interpolation hat F_l(y) from samples and prove an optimality-gap bound. The methodology is validated through two case studies—pandemic vaccine distribution and wildfire firefighting—demonstrating rolling-horizon reallocation that adapts to ground data and forecasts. The work provides a practical, scalable framework for ICS-style disaster response that blends DRL-based local optimization with market-like global coordination.
Abstract
In this paper, we are interested in solving Network Utility Maximization (NUM) problems whose underlying local utilities and constraints depend on a complex stochastic dynamic environment. While the general model applies broadly, this work is motivated by resource sharing during disasters concurrently occurring in multiple areas. In such situations, hierarchical layers of Incident Command Systems (ICS) are engaged; specifically, a central entity (e.g., the federal government) typically coordinates the incident response allocating resources to different sites, which then get distributed to the affected by local entities. The benefits of an allocation decision to the different sites are generally not expressed explicitly as a closed-form utility function because of the complexity of the response and the random nature of the underlying phenomenon we try to contain. We use the classic approach of decomposing the NUM formulation and applying a primal-dual algorithm to achieve optimal higher-level decisions under coupled constraints while modeling the optimized response to the local dynamics with deep reinforcement learning algorithms. The decomposition we propose has several benefits: 1) the entities respond to their local utilities based on a congestion signal conveyed by the ICS upper layers; 2) the complexity of capturing the utility of local responses and their diversity is addressed effectively without sharing local parameters and priorities with the ICS layers above; 3) utilities, known as explicit functions, are approximated as convex functions of the resources allocated; 4) decisions rely on up-to-date data from the ground along with future forecasts.
