Table of Contents
Fetching ...

Decision Transformers for Wireless Communications: A New Paradigm of Resource Management

Jie Zhang, Jun Li, Long Shi, Zhe Wang, Shi Jin, Wen Chen, H. Vincent Poor

TL;DR

This article adopts an alternative AI technology, namely, decision transformer (DT), and proposes a DT-based adaptive decision architecture for wireless resource management that leverages the power of DT models learned over offline datasets to achieve rapid convergence with many fewer training epochs and higher performance in new scenarios with different state and action spaces.

Abstract

As the next generation of mobile systems evolves, artificial intelligence (AI) is expected to deeply integrate with wireless communications for resource management in variable environments. In particular, deep reinforcement learning (DRL) is an important tool for addressing stochastic optimization issues of resource allocation. However, DRL has to start each new training process from the beginning once the state and action spaces change, causing low sample efficiency and poor generalization ability. Moreover, each DRL training process may take a large number of epochs to converge, which is unacceptable for time-sensitive scenarios. In this paper, we adopt an alternative AI technology, namely, Decision Transformer (DT), and propose a DT-based adaptive decision architecture for wireless resource management. This architecture innovates through constructing pre-trained models in the cloud and then fine-tuning personalized models at the edges. By leveraging the power of DT models learned over offline datasets, the proposed architecture is expected to achieve rapid convergence with many fewer training epochs and higher performance in new scenarios with different state and action spaces, compared with DRL. We then design DT frameworks for two typical communication scenarios: intelligent reflecting surfaces-aided communications and unmanned aerial vehicle-aided mobile edge computing. Simulations demonstrate that the proposed DT frameworks achieve over $3$-$6$ times speedup in convergence and better performance relative to the classic DRL method, namely, proximal policy optimization.

Decision Transformers for Wireless Communications: A New Paradigm of Resource Management

TL;DR

This article adopts an alternative AI technology, namely, decision transformer (DT), and proposes a DT-based adaptive decision architecture for wireless resource management that leverages the power of DT models learned over offline datasets to achieve rapid convergence with many fewer training epochs and higher performance in new scenarios with different state and action spaces.

Abstract

As the next generation of mobile systems evolves, artificial intelligence (AI) is expected to deeply integrate with wireless communications for resource management in variable environments. In particular, deep reinforcement learning (DRL) is an important tool for addressing stochastic optimization issues of resource allocation. However, DRL has to start each new training process from the beginning once the state and action spaces change, causing low sample efficiency and poor generalization ability. Moreover, each DRL training process may take a large number of epochs to converge, which is unacceptable for time-sensitive scenarios. In this paper, we adopt an alternative AI technology, namely, Decision Transformer (DT), and propose a DT-based adaptive decision architecture for wireless resource management. This architecture innovates through constructing pre-trained models in the cloud and then fine-tuning personalized models at the edges. By leveraging the power of DT models learned over offline datasets, the proposed architecture is expected to achieve rapid convergence with many fewer training epochs and higher performance in new scenarios with different state and action spaces, compared with DRL. We then design DT frameworks for two typical communication scenarios: intelligent reflecting surfaces-aided communications and unmanned aerial vehicle-aided mobile edge computing. Simulations demonstrate that the proposed DT frameworks achieve over - times speedup in convergence and better performance relative to the classic DRL method, namely, proximal policy optimization.
Paper Structure (12 sections, 5 figures)

This paper contains 12 sections, 5 figures.

Figures (5)

  • Figure 1: Cloud-edge coordinated DT architecture for wireless resource management. The platform consists of the cloud layer and the edge layer. Each communication task involves resource allocation with multiple decision variables, e.g., power allocation, beamforming design, user scheduling, interference management, and load balancing. For a specific task, samples are collected from similar scenarios and uploaded to a cloud-based data buffer. These samples undergo return-to-go processing, providing training data for the DT model. Scenario-specific prompts are designed to offer tailored instructions. The time-embedded samples are fed into the DT model, which updates its parameters by minimizing the loss between predicted and target actions. The pre-trained model is assigned to the edge in a lightweight manner. Each new scenario requires only a few samples for fine-tuning to establish a personalized local model.
  • Figure 2: DT design for IRS-aided communications. The DT model is pre-trained with multiple scenarios to learn the policies of BS's power control and IRS's beamforming. The model trains with scenario-specific prompts and an action embedding technique for managing both discrete and continuous actions. In real-time deployment, the model swiftly adapts to the new scenario by fine-tuning the pre-trained DT model with few-shot learning.
  • Figure 3: The DT-based resource management for UAV-aided MEC with model lightweighting. This architecture incorporates parameter sharing and sparse attention mechanisms to reduce the neural network's complexity and computational demands. Samples are collected from multiple UAV scenarios to pre-train the DT model. For online deployment, the lightweight DT model is fine-tuned to fit the new scenario.
  • Figure 4: Performance and convergence speed comparison between DT, PPO and random methods in IRS-aided communications. We consider two scenarios with $128$ and $64$ IRS elements, respectively.
  • Figure 5: Performance and convergence speed comparison between DT, PPO and random methods in UAV-aided MEC. We consider two scenarios with $3$ UAVs and $4$ UAVs, respectively.