Table of Contents
Fetching ...

Efficient Beam Selection for ISAC in Cell-Free Massive MIMO via Digital Twin-Assisted Deep Reinforcement Learning

Jiexin Zhang, Shu Xu, Chunguo Li, Yongming Huang, Luxi Yang

TL;DR

This work tackles beam selection for cell-free ISAC in CFAR-enabled sensing by deriving the joint detection distribution and formulating beam tracking as an MDP. It introduces a DT-assisted offline DRL framework using a cGAN-based DT to enrich data and a conservative penalty to curb Q-value overestimation, achieving robust policy learning with significantly reduced online interactions. The proposed approach demonstrates convergence guarantees and delivers performance close to online DRL while reducing interaction overhead by about 80%, even under varying power, CFAR, and target velocity conditions. The results highlight the DT module's value for data augmentation and offline learning, enabling scalable, safe, and efficient beamforming in dynamic ISAC environments.

Abstract

Beamforming enhances signal strength and quality by focusing energy in specific directions. This capability is particularly crucial in cell-free integrated sensing and communication (ISAC) systems, where multiple distributed access points (APs) collaborate to provide both communication and sensing services. In this work, we first derive the distribution of joint target detection probabilities across multiple receiving APs under false alarm rate constraints, and then formulate the beam selection procedure as a Markov decision process (MDP). We establish a deep reinforcement learning (DRL) framework, in which reward shaping and sinusoidal embedding are introduced to facilitate agent learning. To eliminate the high costs and associated risks of real-time agent-environment interactions, we further propose a novel digital twin (DT)-assisted offline DRL approach. Different from traditional online DRL, a conditional generative adversarial network (cGAN)-based DT module, operating as a replica of the real world, is meticulously designed to generate virtual state-action transition pairs and enrich data diversity, enabling offline adjustment of the agent's policy. Additionally, we address the out-of-distribution issue by incorporating an extra penalty term into the loss function design. The convergency of agent-DT interaction and the upper bound of the Q-error function are theoretically derived. Numerical results demonstrate the remarkable performance of our proposed approach, which significantly reduces online interaction overhead while maintaining effective beam selection across diverse conditions including strict false alarm control, low signal-to-noise ratios, and high target velocities.

Efficient Beam Selection for ISAC in Cell-Free Massive MIMO via Digital Twin-Assisted Deep Reinforcement Learning

TL;DR

This work tackles beam selection for cell-free ISAC in CFAR-enabled sensing by deriving the joint detection distribution and formulating beam tracking as an MDP. It introduces a DT-assisted offline DRL framework using a cGAN-based DT to enrich data and a conservative penalty to curb Q-value overestimation, achieving robust policy learning with significantly reduced online interactions. The proposed approach demonstrates convergence guarantees and delivers performance close to online DRL while reducing interaction overhead by about 80%, even under varying power, CFAR, and target velocity conditions. The results highlight the DT module's value for data augmentation and offline learning, enabling scalable, safe, and efficient beamforming in dynamic ISAC environments.

Abstract

Beamforming enhances signal strength and quality by focusing energy in specific directions. This capability is particularly crucial in cell-free integrated sensing and communication (ISAC) systems, where multiple distributed access points (APs) collaborate to provide both communication and sensing services. In this work, we first derive the distribution of joint target detection probabilities across multiple receiving APs under false alarm rate constraints, and then formulate the beam selection procedure as a Markov decision process (MDP). We establish a deep reinforcement learning (DRL) framework, in which reward shaping and sinusoidal embedding are introduced to facilitate agent learning. To eliminate the high costs and associated risks of real-time agent-environment interactions, we further propose a novel digital twin (DT)-assisted offline DRL approach. Different from traditional online DRL, a conditional generative adversarial network (cGAN)-based DT module, operating as a replica of the real world, is meticulously designed to generate virtual state-action transition pairs and enrich data diversity, enabling offline adjustment of the agent's policy. Additionally, we address the out-of-distribution issue by incorporating an extra penalty term into the loss function design. The convergency of agent-DT interaction and the upper bound of the Q-error function are theoretically derived. Numerical results demonstrate the remarkable performance of our proposed approach, which significantly reduces online interaction overhead while maintaining effective beam selection across diverse conditions including strict false alarm control, low signal-to-noise ratios, and high target velocities.

Paper Structure

This paper contains 31 sections, 2 theorems, 47 equations, 11 figures, 2 tables, 1 algorithm.

Key Result

Proposition 1

The optimality is guaranteed if and only if the introduced shaping reward is a potential-based shaping function in terms of anteroposterior states. In this work, we define the shaping reward for transition from $\mathbf{s}_t$ to $\mathbf{s}_{t+1}$ as where $\rho$ is the discount factor, and the potential-based function is designed only related to the current state, where $b_1$ controls the ampli

Figures (11)

  • Figure 1: A cell-free ISAC system.
  • Figure 2: The CDF curve of detection probability.
  • Figure 3: Network architectures of dueling DDQN and cGAN.
  • Figure 4: DT-assisted offline DRL method for beam selection.
  • Figure 5: Convergence behaviors of the cGAN-based DT module.
  • ...and 6 more figures

Theorems & Definitions (5)

  • Remark 1
  • Remark 2
  • Proposition 1
  • Theorem 1
  • proof