Table of Contents
Fetching ...

Environment-Aware Transfer Reinforcement Learning for Sustainable Beam Selection

Dariush Salami, Ramin Hashemi, Parham Kazemi, Mikko A. Uusitalo

TL;DR

This work tackles the energy-intensive retraining challenge of RL-based beam selection across diverse propagation environments by modeling environments as point clouds and using Chamfer distance to identify structurally similar deployments for model reuse, achieving a $16\times$ reduction in training time. The proposed Environment-Aware Transfer RL framework enables cross-environment policy sharing, preserving performance while reducing data and computational requirements. Simulations show high beam-selection performance in similar environments and substantial efficiency gains, underscoring the potential for green AI in wireless networks. Future work includes enriching the environmental representation with Channel Knowledge Maps to further boost transfer effectiveness and deployment scalability at the edge.

Abstract

This paper presents a novel and sustainable approach for improving beam selection in 5G and beyond networks using transfer learning and Reinforcement Learning (RL). Traditional RL-based beam selection models require extensive training time and computational resources, particularly when deployed in diverse environments with varying propagation characteristics posing a major challenge for scalability and energy efficiency. To address this, we propose modeling the environment as a point cloud, where each point represents the locations of gNodeBs (gNBs) and surrounding scatterers. By computing the Chamfer distance between point clouds, structurally similar environments can be efficiently identified, enabling the reuse of pre-trained models through transfer learning. This methodology leads to a 16x reduction in training time and computational overhead, directly contributing to energy efficiency. By minimizing the need for retraining in each new deployment, our approach significantly lowers power consumption and supports the development of green and sustainable Artificial Intelligence (AI) in wireless systems. Furthermore, it accelerates time-to-deployment, reduces carbon emissions associated with training, and enhances the viability of deploying AI-driven communication systems at the edge. Simulation results confirm that our approach maintains high performance while drastically cutting energy costs, demonstrating the potential of transfer learning to enable scalable, adaptive, and environmentally conscious RL-based beam selection strategies in dynamic and diverse propagation environments.

Environment-Aware Transfer Reinforcement Learning for Sustainable Beam Selection

TL;DR

This work tackles the energy-intensive retraining challenge of RL-based beam selection across diverse propagation environments by modeling environments as point clouds and using Chamfer distance to identify structurally similar deployments for model reuse, achieving a reduction in training time. The proposed Environment-Aware Transfer RL framework enables cross-environment policy sharing, preserving performance while reducing data and computational requirements. Simulations show high beam-selection performance in similar environments and substantial efficiency gains, underscoring the potential for green AI in wireless networks. Future work includes enriching the environmental representation with Channel Knowledge Maps to further boost transfer effectiveness and deployment scalability at the edge.

Abstract

This paper presents a novel and sustainable approach for improving beam selection in 5G and beyond networks using transfer learning and Reinforcement Learning (RL). Traditional RL-based beam selection models require extensive training time and computational resources, particularly when deployed in diverse environments with varying propagation characteristics posing a major challenge for scalability and energy efficiency. To address this, we propose modeling the environment as a point cloud, where each point represents the locations of gNodeBs (gNBs) and surrounding scatterers. By computing the Chamfer distance between point clouds, structurally similar environments can be efficiently identified, enabling the reuse of pre-trained models through transfer learning. This methodology leads to a 16x reduction in training time and computational overhead, directly contributing to energy efficiency. By minimizing the need for retraining in each new deployment, our approach significantly lowers power consumption and supports the development of green and sustainable Artificial Intelligence (AI) in wireless systems. Furthermore, it accelerates time-to-deployment, reduces carbon emissions associated with training, and enhances the viability of deploying AI-driven communication systems at the edge. Simulation results confirm that our approach maintains high performance while drastically cutting energy costs, demonstrating the potential of transfer learning to enable scalable, adaptive, and environmentally conscious RL-based beam selection strategies in dynamic and diverse propagation environments.

Paper Structure

This paper contains 11 sections, 3 equations, 6 figures.

Figures (6)

  • Figure 1: The transmit-end beam refinement procedure, considering four - resources transmitted in four different directions.
  • Figure 2: The schematic view of the proposed method. The shown in the same colors (orange and green) have similar signal propagation characteristics (e.g. surrounding buildings). The numbers inside dashed circles refer to the steps in Fig. \ref{['fig:model_sharing']}. Plus, a map of environments based on the pair-wise Chamfer distance is shown in top right circle. Orange nodes are the two environments with two orange , and green ones are the ones with two green .
  • Figure 3: The sequence diagram of the proposed scheme for calculating the distance map and sharing the trained beam selection model accordingly.
  • Figure 4: The three simulation environments are depicted in different colors. Environments A and B are more similar to each other because the locations of the scatterers and are closer compared to environment C. In all environments, the follows the path shown by the dashed square.
  • Figure 5: The left axis represents the ratio of the agent's to the maximum , while the right axis shows the Chamfer distance of the environments from environment A. All agents are trained on environment A and tested against environments A, B, and C as indicated on the x-axis.
  • ...and 1 more figures