Reinforcement Learning for Dynamic Resource Allocation in Optical Networks: Hype or Hope?
Michael Doherty, Robin Matzner, Rasoul Sadeghi, Polina Bayvel, Alejandra Beghelli
TL;DR
This paper documents a critical evaluation of reinforcement learning approaches for dynamic resource allocation in optical networks, highlighting pervasive benchmarking gaps and reproducibility issues. Through a systematic recreation of five influential RL studies and extensive heuristic benchmarking, the authors show that well-tuned simple heuristics can outperform published RL solutions across several network topologies. They introduce an empirical defragmentation-based lower-bound method (Resource-Prioritized Defragmentation) to bound the potential gains from advanced methods, finding 19%–36% additional traffic load could be supported at a fixed low blocking probability in their examples. The work also releases the XLRON framework to enable reproducible evaluations and provides concrete benchmarking recommendations to advance rigorous, fair comparisons in the field.
Abstract
The application of reinforcement learning (RL) to dynamic resource allocation in optical networks has been the focus of intense research activity in recent years, with almost 100 peer-reviewed papers. We present a review of progress in the field, and identify significant gaps in benchmarking practices and reproducibility. To determine the strongest benchmark algorithms, we systematically evaluate several heuristics across diverse network topologies. We find that path count and sort criteria for path selection significantly affect the benchmark performance. We meticulously recreate the problems from five landmark papers and apply the improved benchmarks. Our comparisons demonstrate that simple heuristics consistently match or outperform the published RL solutions, often with an order of magnitude lower blocking probability. Furthermore, we present empirical lower bounds on network blocking using a novel defragmentation-based method, revealing that potential improvements over the benchmark heuristics are limited to 19-36% increased traffic load for the same blocking performance in our examples. We make our simulation framework and results publicly available to promote reproducible research and standardized evaluation https://doi.org/10.5281/zenodo.12594495.
