Table of Contents
Fetching ...

FGGM: Formal Grey-box Gradient Method for Attacking DRL-based MU-MIMO Scheduler

Thanh Le, Hai Duong, Yusheng Ji, ThanhVu Nguyen, John C. S. Lui

TL;DR

This work investigates the security of DRL-based MU-MIMO schedulers by proposing FGGM, a formal grey-box gradient method that generates adversarial observations using polytope-based output bounds. By leveraging the observation normalizer to bound victim observations and performing a one-shot optimization, FGGM degrades victim throughput significantly, up to 70% under 50% attacker scenarios, without knowing exact victim observations. The approach combines DeepPoly-style verification techniques with Wolpertinger action handling to craft effective adversarial inputs, contributing a practical threat model and a robust attack framework that can extend to other DRL-based resource allocation problems. The findings underscore a critical need for adversarial training and verification-guided defenses to improve robustness of DRL-driven wireless schedulers in dynamic, multi-user environments.

Abstract

In 5G mobile communication systems, MU-MIMO has been applied to enhance spectral efficiency and support high data rates. To maximize spectral efficiency while providing fairness among users, the base station (BS) needs to selects a subset of users for data transmission. Given that this problem is NP-hard, DRL-based methods have been proposed to infer the near-optimal solutions in real-time, yet this approach has an intrinsic security problem. This paper investigates how a group of adversarial users can exploit unsanitized raw CSIs to launch a throughput degradation attack. Most existing studies only focused on systems in which adversarial users can obtain the exact values of victims' CSIs, but this is impractical in the case of uplink transmission in LTE/5G mobile systems. We note that the DRL policy contains an observation normalizer which has the mean and variance of the observation to improve training convergence. Adversarial users can then estimate the upper and lower bounds of the local observations including the CSIs of victims based solely on that observation normalizer. We develop an attacking scheme FGGM by leveraging polytope abstract domains, a technique used to bound the outputs of a neural network given the input ranges. Our goal is to find one set of intentionally manipulated CSIs which can achieve the attacking goals for the whole range of local observations of victims. Experimental results demonstrate that FGGM can determine a set of adversarial CSI vector controlled by adversarial users, then reuse those CSIs throughout the simulation to reduce the network throughput of a victim up to 70\% without knowing the exact value of victims' local observations. This study serves as a case study and can be applied to many other DRL-based problems, such as a knapsack-oriented resource allocation problems.

FGGM: Formal Grey-box Gradient Method for Attacking DRL-based MU-MIMO Scheduler

TL;DR

This work investigates the security of DRL-based MU-MIMO schedulers by proposing FGGM, a formal grey-box gradient method that generates adversarial observations using polytope-based output bounds. By leveraging the observation normalizer to bound victim observations and performing a one-shot optimization, FGGM degrades victim throughput significantly, up to 70% under 50% attacker scenarios, without knowing exact victim observations. The approach combines DeepPoly-style verification techniques with Wolpertinger action handling to craft effective adversarial inputs, contributing a practical threat model and a robust attack framework that can extend to other DRL-based resource allocation problems. The findings underscore a critical need for adversarial training and verification-guided defenses to improve robustness of DRL-driven wireless schedulers in dynamic, multi-user environments.

Abstract

In 5G mobile communication systems, MU-MIMO has been applied to enhance spectral efficiency and support high data rates. To maximize spectral efficiency while providing fairness among users, the base station (BS) needs to selects a subset of users for data transmission. Given that this problem is NP-hard, DRL-based methods have been proposed to infer the near-optimal solutions in real-time, yet this approach has an intrinsic security problem. This paper investigates how a group of adversarial users can exploit unsanitized raw CSIs to launch a throughput degradation attack. Most existing studies only focused on systems in which adversarial users can obtain the exact values of victims' CSIs, but this is impractical in the case of uplink transmission in LTE/5G mobile systems. We note that the DRL policy contains an observation normalizer which has the mean and variance of the observation to improve training convergence. Adversarial users can then estimate the upper and lower bounds of the local observations including the CSIs of victims based solely on that observation normalizer. We develop an attacking scheme FGGM by leveraging polytope abstract domains, a technique used to bound the outputs of a neural network given the input ranges. Our goal is to find one set of intentionally manipulated CSIs which can achieve the attacking goals for the whole range of local observations of victims. Experimental results demonstrate that FGGM can determine a set of adversarial CSI vector controlled by adversarial users, then reuse those CSIs throughout the simulation to reduce the network throughput of a victim up to 70\% without knowing the exact value of victims' local observations. This study serves as a case study and can be applied to many other DRL-based problems, such as a knapsack-oriented resource allocation problems.

Paper Structure

This paper contains 28 sections, 21 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: The system model of the scheduling problem.
  • Figure 2: Threat model of -based scheduler for networks.
  • Figure 3: The average selection probability of victims when attacker using different attacking schemes to create adversarial CSIs. $L=8, M=4$, and $\bar{N}=4$. $\delta_{vic}$ and $\delta_{adv} \in \{0.5, 1.0, 1.5, 2.0, 2.5, 3.0\}$. Lower is better.
  • Figure 4: The minimum probability and transmission rate of victims as the number of attackers increased from 1 to 8. $L=16$, $M=16$, and $\bar{N}=4$.
  • Figure 5: Performance of all schedulers and attacking schemes in networks configuration with $L=8$$M=4$, $\bar{N}=4$, and $L_{adv}=4$
  • ...and 1 more figures