A Unified Learning-based Optimization Framework for 0-1 Mixed Problems in Wireless Networks
Kairong Ma, Yao Sun, Shuheng Hua, Muhammad Ali Imran, Walid Saad
TL;DR
The paper tackles 0-1 mixed optimization problems in wireless networks by marrying convex optimization with reinforcement learning. It converts binary decisions into an MDP and uses a convex relaxation to generate a high-potential zone (HPZ) that guides RL exploration, providing theoretical convergence guarantees. The approach extends to non-convex objectives and constraints via convex envelopes, hull relaxations, and arc-consistency strategies, preserving global optima where possible. Empirical results show substantial gains in convergence speed (up to ~30% faster than B&B in small-scale tests) and objective quality (up to ~20% improvement in large-scale scenarios) compared with state-of-the-art baselines, highlighting improved scalability and robustness for practical wireless-network optimization. The framework offers a unified, principled pathway to efficiently solve large, complex 0-1 mixed problems in real-world networking tasks.
Abstract
Several wireless networking problems are often posed as 0-1 mixed optimization problems, which involve binary variables (e.g., selection of access points, channels, and tasks) and continuous variables (e.g., allocation of bandwidth, power, and computing resources). Traditional optimization methods as well as reinforcement learning (RL) algorithms have been widely exploited to solve these problems under different network scenarios. However, solving such problems becomes more challenging when dealing with a large network scale, multi-dimensional radio resources, and diversified service requirements. To this end, in this paper, a unified framework that combines RL and optimization theory is proposed to solve 0-1 mixed optimization problems in wireless networks. First, RL is used to capture the process of solving binary variables as a sequential decision-making task. During the decision-making steps, the binary (0-1) variables are relaxed and, then, a relaxed problem is solved to obtain a relaxed solution, which serves as prior information to guide RL searching policy. Then, at the end of decision-making process, the search policy is updated via suboptimal objective value based on decisions made. The performance bound and convergence guarantees of the proposed framework are then proven theoretically. An extension of this approach is provided to solve problems with a non-convex objective function and/or non-convex constraints. Numerical results show that the proposed approach reduces the convergence time by about 30% over B&B in small-scale problems with slightly higher objective values. In large-scale scenarios, it can improve the normalized objective values by 20% over RL with a shorter convergence time.
