Table of Contents
Fetching ...

Bayesian Optimization for Non-Cooperative Game-Based Radio Resource Management

Yunchuan Zhang, Jiechen Chen, Junshuo Liu, Robert C. Qiu

TL;DR

This work addresses stable spectrum sharing in multi-base-station networks where utilities are costly and black-box. It introduces PPR-UCB, a Bayesian optimization policy that leverages martingale-based prior-posterior ratio confidence sets to approximate pure Nash equilibria in non-cooperative downlink power-control games. By modeling each base station's utility with Gaussian processes and deriving anytime-valid confidence sets, the method achieves data-efficient convergence to near-equilibrium configurations. Experimental results in a multi-cell MIMO setting demonstrate improved efficiency and scalability over existing Bayesian and RL-based approaches, highlighting practical potential for coordinated resource management.

Abstract

Radio resource management in modern cellular networks often calls for the optimization of complex utility functions that are potentially conflicting between different base stations (BSs). Coordinating the resource allocation strategies efficiently across BSs to ensure stable network service poses significant challenges, especially when each utility is accessible only via costly, black-box evaluations. This paper considers formulating the resource allocation among spectrum sharing BSs as a non-cooperative game, with the goal of aligning their allocation incentives toward a stable outcome. To address this challenge, we propose PPR-UCB, a novel Bayesian optimization (BO) strategy that learns from sequential decision-evaluation pairs to approximate pure Nash equilibrium (PNE) solutions. PPR-UCB applies martingale techniques to Gaussian process (GP) surrogates and constructs high probability confidence bounds for utilities uncertainty quantification. Experiments on downlink transmission power allocation in a multi-cell multi-antenna system demonstrate the efficiency of PPR-UCB in identifying effective equilibrium solutions within a few data samples.

Bayesian Optimization for Non-Cooperative Game-Based Radio Resource Management

TL;DR

This work addresses stable spectrum sharing in multi-base-station networks where utilities are costly and black-box. It introduces PPR-UCB, a Bayesian optimization policy that leverages martingale-based prior-posterior ratio confidence sets to approximate pure Nash equilibria in non-cooperative downlink power-control games. By modeling each base station's utility with Gaussian processes and deriving anytime-valid confidence sets, the method achieves data-efficient convergence to near-equilibrium configurations. Experimental results in a multi-cell MIMO setting demonstrate improved efficiency and scalability over existing Bayesian and RL-based approaches, highlighting practical potential for coordinated resource management.

Abstract

Radio resource management in modern cellular networks often calls for the optimization of complex utility functions that are potentially conflicting between different base stations (BSs). Coordinating the resource allocation strategies efficiently across BSs to ensure stable network service poses significant challenges, especially when each utility is accessible only via costly, black-box evaluations. This paper considers formulating the resource allocation among spectrum sharing BSs as a non-cooperative game, with the goal of aligning their allocation incentives toward a stable outcome. To address this challenge, we propose PPR-UCB, a novel Bayesian optimization (BO) strategy that learns from sequential decision-evaluation pairs to approximate pure Nash equilibrium (PNE) solutions. PPR-UCB applies martingale techniques to Gaussian process (GP) surrogates and constructs high probability confidence bounds for utilities uncertainty quantification. Experiments on downlink transmission power allocation in a multi-cell multi-antenna system demonstrate the efficiency of PPR-UCB in identifying effective equilibrium solutions within a few data samples.

Paper Structure

This paper contains 8 sections, 1 theorem, 33 equations, 4 figures, 1 algorithm.

Key Result

Lemma 1

Under Assumption assumption: bayesian linear regression, the confidence set eq: Bayesian confidence sequence is anytime valid at level $1-\delta$ in the sense that it includes the ground truth parameters $\boldsymbol{\theta}_n^*$ with probability no smaller than $1-\delta$ for all time $t\geq 1$: In eq: Bayesian theta coverage, the probability is evaluated with respect to the ground truth distrib

Figures (4)

  • Figure 1: This paper studies a setting in which a central optimizer uses BO to approximate the pure Nash equilibrium (PNE) for a non-cooperative downlink transmission power control game with costly-to-evaluate black-box utility functions of $N$ BSs. At any time $t+1$, the central optimizer assigns an action profile $\mathbf{x}_{t+1}$ to all BSs. As a result, the optimizer receives noisy utility feedback $y_{n,t}$ about the corresponding utility value $u_n(\mathbf{x}_{t+1})$ for all BSs $n\in\mathcal{N}$. The goal is to approach a solution in the $\epsilon$-PNE set \ref{['eq: epsilon pne']}, where $\epsilon\geq 0$ represents the dissatisfaction tolerance.
  • Figure 2: Sum spectral efficiency against the number of optimization iterations $T$ for PE (blue dash-dotted line), UCB-PNE (orange dashed line), and PPR-UCB with parameter $\delta=0.05$ (green solid line).
  • Figure 3: Regret gap against the number of optimization iterations $T$ for PE (blue dash-dotted line), UCB-PNE (orange dashed line), and PPR-UCB with parameter $\delta=0.05$ (green solid line).
  • Figure 4: Regret gap against the number of BSs $N$ for PE (blue dash-dotted line), UCB-PNE (orange dashed line), and PPR-UCB with parameter $\delta=0.05$ (green solid line).

Theorems & Definitions (2)

  • Lemma 1: Utility Parameters Coverage Guarantee of PPR-UCB
  • proof