DRESS: Diffusion Reasoning-based Reward Shaping Scheme For Intelligent Networks
Feiran You, Hongyang Du, Xiangwang Hou, Yong Ren, Kaibin Huang
TL;DR
The paper tackles reward sparsity in DRL for 6G-era wireless networks by introducing DRESS, a diffusion-based reward shaping scheme that generates informative auxiliary rewards conditioned on state-action pairs. DRESS operates as a diffusion reasoning module that can be plugged into any DRL algorithm, using a forward diffusion and reverse denoising process guided by a Q-value evaluator to produce latent representations that translate into meaningful rewards. Empirical results across a dedicated MECLatency wireless benchmark and seven standard DRL benchmarks show that DRESSed-DRL achieves faster convergence (approximately 1.5x) in sparse-feedback settings and provides robust, generalizable improvements across architectures and tasks, with up to 33% gains over competitive reward shaping methods. The framework offers a practical, architecture-agnostic approach to enhancing learning efficiency and stability in complex wireless environments, enabling more reliable optimization under extreme conditions and motivating future extensions to broader domains such as integrated sensing, channel knowledge graphs, and security.
Abstract
Network optimization remains fundamental in wireless communications, with Artificial Intelligence (AI)-based solutions gaining widespread adoption. As Sixth-Generation (6G) communication networks pursue full-scenario coverage, optimization in complex extreme environments presents unprecedented challenges. The dynamic nature of these environments, combined with physical constraints, makes it difficult for AI solutions such as Deep Reinforcement Learning (DRL) to obtain effective reward feedback for the training process. However, many existing DRL-based network optimization studies overlook this challenge through idealized environment settings. Inspired by the powerful capabilities of Generative AI (GenAI), especially diffusion models, in capturing complex latent distributions, we introduce a novel Diffusion Reasoning-based Reward Shaping Scheme (DRESS) to achieve robust network optimization. By conditioning on observed environmental states and executed actions, DRESS leverages diffusion models' multi-step denoising process as a form of deep reasoning, progressively refining latent representations to generate meaningful auxiliary reward signals that capture patterns of network systems. Moreover, DRESS is designed for seamless integration with any DRL framework, allowing DRESS-aided DRL (DRESSed-DRL) to enable stable and efficient DRL training even under extreme network environments. Experimental results demonstrate that DRESSed-DRL achieves about 1.5x times faster convergence than its original version in sparse-reward wireless environments and significant performance improvements in multiple general DRL benchmark environments compared to baseline methods. The code of DRESS is available at https://github.com/NICE-HKU/DRESS.
