Novel Closed Loop Control Mechanism for Zero Touch Networks using BiLSTM and Q-Learning
Tamizhelakkiya K, Dibakar Das, Jyotsna Bapat, Debabrata Das, Komal Sharma
TL;DR
The paper tackles the need for automated closed-loop control in Zero Touch Networks (ZTN) for 6G by proposing a two-stage solution: a hybrid BiLSTM+XGBoost model predicts the network state, and a $Q$-learning agent selects traffic-shaping actions to steer the network toward the predicted state. This enables continuous, proactive QoS optimization under dynamic congestion conditions. Simulation results demonstrate that the hybrid predictor achieves high fidelity ($\hat{y}_i$ closely matching $y_i$) and that the Q-learning controller converges over $40{,}000$ episodes to choose actions that align with actual states, attaining up to 95% accuracy in state matching. The approach offers a practical, automated mechanism for ZTN management with potential extensions to load balancing and other control tasks in future work.
Abstract
As networks advance toward the Sixth Generation (6G), management of high-speed and ubiquitous connectivity poses major challenges in meeting diverse Service Level Agreements (SLAs). The Zero Touch Network (ZTN) framework has been proposed to automate and optimize network management tasks. It ensures SLAs are met effectively even during dynamic network conditions. Though, ZTN literature proposes closed-loop control, methods for implementing such a mechanism remain largely unexplored. This paper proposes a novel two-stage closedloop control for ZTN to optimize the network continuously. First, an XGBoosted Bidirectional Long Short Term Memory (BiLSTM) model is trained to predict the network state (in terms of bandwidth). In the second stage, the Q-learning algorithm selects actions based on the predicted network state to optimize Quality of Service (QoS) parameters. By selecting appropriate actions, it serves the applications perpetually within the available resource limits in a closed loop. Considering the scenario of network congestion, with available bandwidth as state and traffic shaping options as an action for mitigation, results show that the proposed closed-loop mechanism can adjust to changing network conditions. Simulation results show that the proposed mechanism achieves 95% accuracy in matching the actual network state by selecting the appropriate action based on the predicted state.
