Table of Contents
Fetching ...

Reinforcement Learning for Dynamic Resource Allocation in Optical Networks: Hype or Hope?

Michael Doherty, Robin Matzner, Rasoul Sadeghi, Polina Bayvel, Alejandra Beghelli

TL;DR

This paper documents a critical evaluation of reinforcement learning approaches for dynamic resource allocation in optical networks, highlighting pervasive benchmarking gaps and reproducibility issues. Through a systematic recreation of five influential RL studies and extensive heuristic benchmarking, the authors show that well-tuned simple heuristics can outperform published RL solutions across several network topologies. They introduce an empirical defragmentation-based lower-bound method (Resource-Prioritized Defragmentation) to bound the potential gains from advanced methods, finding 19%–36% additional traffic load could be supported at a fixed low blocking probability in their examples. The work also releases the XLRON framework to enable reproducible evaluations and provides concrete benchmarking recommendations to advance rigorous, fair comparisons in the field.

Abstract

The application of reinforcement learning (RL) to dynamic resource allocation in optical networks has been the focus of intense research activity in recent years, with almost 100 peer-reviewed papers. We present a review of progress in the field, and identify significant gaps in benchmarking practices and reproducibility. To determine the strongest benchmark algorithms, we systematically evaluate several heuristics across diverse network topologies. We find that path count and sort criteria for path selection significantly affect the benchmark performance. We meticulously recreate the problems from five landmark papers and apply the improved benchmarks. Our comparisons demonstrate that simple heuristics consistently match or outperform the published RL solutions, often with an order of magnitude lower blocking probability. Furthermore, we present empirical lower bounds on network blocking using a novel defragmentation-based method, revealing that potential improvements over the benchmark heuristics are limited to 19-36% increased traffic load for the same blocking performance in our examples. We make our simulation framework and results publicly available to promote reproducible research and standardized evaluation https://doi.org/10.5281/zenodo.12594495.

Reinforcement Learning for Dynamic Resource Allocation in Optical Networks: Hype or Hope?

TL;DR

This paper documents a critical evaluation of reinforcement learning approaches for dynamic resource allocation in optical networks, highlighting pervasive benchmarking gaps and reproducibility issues. Through a systematic recreation of five influential RL studies and extensive heuristic benchmarking, the authors show that well-tuned simple heuristics can outperform published RL solutions across several network topologies. They introduce an empirical defragmentation-based lower-bound method (Resource-Prioritized Defragmentation) to bound the potential gains from advanced methods, finding 19%–36% additional traffic load could be supported at a fixed low blocking probability in their examples. The work also releases the XLRON framework to enable reproducible evaluations and provides concrete benchmarking recommendations to advance rigorous, fair comparisons in the field.

Abstract

The application of reinforcement learning (RL) to dynamic resource allocation in optical networks has been the focus of intense research activity in recent years, with almost 100 peer-reviewed papers. We present a review of progress in the field, and identify significant gaps in benchmarking practices and reproducibility. To determine the strongest benchmark algorithms, we systematically evaluate several heuristics across diverse network topologies. We find that path count and sort criteria for path selection significantly affect the benchmark performance. We meticulously recreate the problems from five landmark papers and apply the improved benchmarks. Our comparisons demonstrate that simple heuristics consistently match or outperform the published RL solutions, often with an order of magnitude lower blocking probability. Furthermore, we present empirical lower bounds on network blocking using a novel defragmentation-based method, revealing that potential improvements over the benchmark heuristics are limited to 19-36% increased traffic load for the same blocking performance in our examples. We make our simulation framework and results publicly available to promote reproducible research and standardized evaluation https://doi.org/10.5281/zenodo.12594495.

Paper Structure

This paper contains 25 sections, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: Count of publications related to RL for resource allocation problems in optical networks. Citations for each classification category are: RWA garcia_multicast_2003pointurier_reinforcement_2007koyanagi_reinforcement_2009suarez-varela_routing_2019shiraki_dynamic_2019shiraki_reinforcement-learning-based_2019tzanakaki_self-learning_2020zhao_cost-efficient_2021freire-hermelo_dynamic_2021liu_waveband_2021nevin_techniques_2022di_cicco_deep_2022di_cicco_deepls_2023nallaperuma_interpreting_2023, RSA reyes_adaptive_2017li_deepcoop_2020li_multi-objective_2020romero_reyes_towards_2021zhao_reinforced_2021wang_dynamic_2021quang_magc-rsa_2022cruzado_reinforcement-learning-based_2022zhao_rsa_2022jiao_reliability-oriented_2022almasan_deep_2022wu_service_2022arce_reinforcement_2022sharma_deep_2023lin_deep-reinforcement-learning-based_2023hernandez-chulde_experimental_2023cheng_ptrnet-rsa_2024chen_gsaddqn_2024, RMSA chen_deeprmsa_2019wang_resource_2020shimoda_mask_2021shi_deep-reinforced_2021shimoda_deep_2021sheikh_multi-band_2021xu_spectrum_2021chen_multi-task-learning-based_2021gonzalez_improving_2022bryant_q-learning_2022terki_routing_2022tang_deep_2022cheng_routing_2022xu_deep_2022tang_heuristic_2022tu_entropy-based_2022momo_ziazet_deep_2022pinto-rios_resource_2023errea_deep_2023beghelli_approaches_2023terki_routing_2023tanaka_pre-_2023xu_hierarchical_2023sadeghi_performance_2023tang_routing_2023teng_deep-reinforcement-learning-based_2024xiong_graph_2024teng_drl-assisted_2024unzain_reinforcement_2024zhou_opti-deeproute_2024li_opticgai_2024xie_physical_2024yan_drl-based_2024, Other boyan_packet_1993ma_demonstration_2019zhao_reinforcement-learning-based_2019wang_subcarrier-slot_2019luo_leveraging_2019natalino_optical_2020ma_co-allocation_2020wang_deepcms_2020weixer_reinforcement_2020liu_multi-agent_2021zhao_service_2021tian_reconfiguring_2021morales_multi-band_2021tanaka_reinforcement-learning-based_2022koch_reinforcement_2022hernandez-chulde_evaluation_2022etezadi_deepdefrag_2022etezadi_deep_2023tanaka_adaptive_2023johari_drl-assisted_2023zhang_admire_2023fan_blocking-driven_2023lian_dynamic_2024li_tabdeep_2024wang_availability-aware_2024yin_dnn_2024tse_reinforcement_2024tanaka_reinforcement-learning-based_2024doherty_xlron_2024natalino_optical_2024mccann_sdonsim_2024jara_dream-gym_2024.
  • Figure 2: Network topologies used in our case studies from: DeepRMSA, Reward-RMSA, GCN-RMSA, MaskRSA, PtrNet-RSA chen_deeprmsa_2019tang_heuristic_2022xu_deep_2022shimoda_mask_2021cheng_ptrnet-rsa_2024. We note that the USNET topology differs between GCN-RMSA and PtrNet-RSA. We show the GCN-RMSA version here. PtrNet-RSA also uses a variant of the COST239 topology.
  • Figure 3: Comparison of heuristic algorithms. (a) Service blocking probability (SBP) at fixed traffic and varying numbers of candidate paths (K). (b) SBP for KSP-FF at varying traffic loads and K=2 to K=40. (c) SBP at varying traffic load for K=50. The mean and standard deviation (shaded area) are calculated from 3000 trials of 10,000 traffic requests per data point. KSP-FF or FF-KSP with K=50 are found to give the lowest blocking.
  • Figure 4: Histogram of service holding holding times. The truncated distribution resamples the holding time when the sampled value exceeds $2*$mean. This reduces the mean holding time by 31% compared to the standard exponential distribution.
  • Figure 5: Mean SBP against traffic load. Each column is a publication and each subplot is for a topology. Error bars and shaded areas show standard deviations. 50-SP-FF$_{hops}$ exceeds or matches the $RL$ performance for each case.
  • ...and 1 more figures