Analyzing homogenous and heterogeneous multi-server queues via neural networks
Eliran Sherzer
TL;DR
The paper tackles predicting the stationary distribution of the number of customers in two multi-server queue models, GI/GI/c and GI/GI_i/2, under non-Markovian inputs. It uses a neural network trained on features derived from the first four moments of inter-arrival and service-time distributions, with Phase-Type sampling to generate diverse inputs and offline simulations to label data. The resulting model achieves state-of-the-art accuracy for GI/GI/c and is the first to predict the full stationary distribution for GI/GI_i/2, with typical errors well below 3% and fast, parallel inference enabling real-time decision support. A key finding is that including moments beyond the fourth provides little to no improvement and can even hurt performance, guiding efficient feature selection. The work also demonstrates practical utility through a numerical optimization example and openly provides code for replication and extension to more complex queueing networks.
Abstract
In this paper, we use a machine learning approach to predict the stationary distributions of the number of customers in a single-staiton multi server system. We consider two systems, the first is $c$ homogeneous servers, namely the $GI/GI/c$ queue. The second is a two-heterogeneous server system, namely the $GI/GI_i/2$ queue. We train a neural network for these queueing models, using the first four inter-arrival and service time moments. We demonstrate empirically that using the fifth moment and beyond does not increase accuracy. Compared to existing methods, we show that in terms of the stationary distribution and the mean value of the number of customers in a $GI/GI/c$ queue, we are state-of-the-art. Further, we are the only ones to predict the stationary distribution of the number of customers in the system in a $GI/GI_i/2$ queue. We conduct a thorough performance evaluation to assert that our model is accurate. In most cases, we demonstrate that our error is less than 5\%. Finally, we show that making inferences is very fast, where 5000 inferences can be made in parallel within a fraction of a second.
