Table of Contents
Fetching ...

DeepSeek-Inspired Exploration of RL-based LLMs and Synergy with Wireless Networks: A Survey

Yu Qiao, Phuong-Nam Tran, Ji Su Yoon, Loc X. Nguyen, Eui-Nam Huh, Dusit Niyato, Choong Seon Hong

TL;DR

This survey investigates the mutual empowerment between RL-based LLMs and wireless networks, motivated by DeepSeek-style pure-RL LLMs and AI-enabled communication infrastructures. It surveys core techniques (RL, RLHF, pure RL) and enabling wireless technologies (semantic communication, ISAC, NTNs, MEC), analyzes motivations, challenges, and solutions for integrating LLMs with networks, and highlights applications (intelligent ecosystems, autonomous networks, Industry 4.0, education). The work emphasizes dual perspectives—how LLMs can optimize wireless systems and how wireless networks can accelerate LLM training, deployment, and robustness—while addressing data privacy, compute, security, and fairness concerns. By outlining future directions (quantum/on-device/neural-symbolic/embodied AI) and societal impacts, the paper aims to guide researchers and practitioners toward practical, scalable, and responsible deployments of RL-based LLMs in next-generation wireless environments.

Abstract

Reinforcement learning (RL)-based large language models (LLMs), such as ChatGPT, DeepSeek, and Grok-3, have attracted widespread attention for their remarkable capabilities in multimodal data understanding. Meanwhile, the rapid expansion of information services has led to a growing demand for AI-enabled wireless networks. The open-source DeepSeek models are famous for their innovative designs, such as large-scale pure RL and cost-efficient training, which make them well-suited for practical deployment in wireless networks. By integrating DeepSeek-style LLMs with wireless infrastructures, a synergistic opportunity arises: the DeepSeek-style LLMs enhance network optimization with strong reasoning and decision-making abilities, while wireless infrastructure enables the broad deployment of these models. Motivated by this convergence, this survey presents a comprehensive DeepSeek-inspired exploration of RL-based LLMs in the context of wireless networks. We begin by reviewing key techniques behind network optimization to establish a foundation for understanding DeepSeek-style LLM integration. Next, we examine recent advancements in RL-based LLMs, using DeepSeek models as a representative example. Building on this, we explore the synergy between the two domains, highlighting motivations, challenges, and potential solutions. Finally, we highlight emerging directions for integrating LLMs with wireless networks, such as quantum, on-device, and neural-symbolic LLM models, as well as embodied AI agents. Overall, this survey offers a comprehensive examination of the interplay between DeepSeek-style LLMs and wireless networks, demonstrating how these domains can mutually enhance each other to drive innovation.

DeepSeek-Inspired Exploration of RL-based LLMs and Synergy with Wireless Networks: A Survey

TL;DR

This survey investigates the mutual empowerment between RL-based LLMs and wireless networks, motivated by DeepSeek-style pure-RL LLMs and AI-enabled communication infrastructures. It surveys core techniques (RL, RLHF, pure RL) and enabling wireless technologies (semantic communication, ISAC, NTNs, MEC), analyzes motivations, challenges, and solutions for integrating LLMs with networks, and highlights applications (intelligent ecosystems, autonomous networks, Industry 4.0, education). The work emphasizes dual perspectives—how LLMs can optimize wireless systems and how wireless networks can accelerate LLM training, deployment, and robustness—while addressing data privacy, compute, security, and fairness concerns. By outlining future directions (quantum/on-device/neural-symbolic/embodied AI) and societal impacts, the paper aims to guide researchers and practitioners toward practical, scalable, and responsible deployments of RL-based LLMs in next-generation wireless environments.

Abstract

Reinforcement learning (RL)-based large language models (LLMs), such as ChatGPT, DeepSeek, and Grok-3, have attracted widespread attention for their remarkable capabilities in multimodal data understanding. Meanwhile, the rapid expansion of information services has led to a growing demand for AI-enabled wireless networks. The open-source DeepSeek models are famous for their innovative designs, such as large-scale pure RL and cost-efficient training, which make them well-suited for practical deployment in wireless networks. By integrating DeepSeek-style LLMs with wireless infrastructures, a synergistic opportunity arises: the DeepSeek-style LLMs enhance network optimization with strong reasoning and decision-making abilities, while wireless infrastructure enables the broad deployment of these models. Motivated by this convergence, this survey presents a comprehensive DeepSeek-inspired exploration of RL-based LLMs in the context of wireless networks. We begin by reviewing key techniques behind network optimization to establish a foundation for understanding DeepSeek-style LLM integration. Next, we examine recent advancements in RL-based LLMs, using DeepSeek models as a representative example. Building on this, we explore the synergy between the two domains, highlighting motivations, challenges, and potential solutions. Finally, we highlight emerging directions for integrating LLMs with wireless networks, such as quantum, on-device, and neural-symbolic LLM models, as well as embodied AI agents. Overall, this survey offers a comprehensive examination of the interplay between DeepSeek-style LLMs and wireless networks, demonstrating how these domains can mutually enhance each other to drive innovation.

Paper Structure

This paper contains 63 sections, 12 figures.

Figures (12)

  • Figure 1: Organization of the Survey.
  • Figure 2: Various strategies for enhancing wireless communication systems: optimizing resource management in existing networks, improving transmission efficiency by transitioning from traditional to semantic communication, and leveraging environmental sensing to acquire physical context information for enhanced antenna beamforming.
  • Figure 3: Diagram depicting the three stages of integrating RL into training LLMs: (1) Supervised fine-tuning, (2) Reward definition, and (3) Policy optimization schulman2017proximal_arxivshao2024deepseekmath_arxiv. Step 2 shows the comparison of three reward model definition approaches: RL from Human Feedback (RLHF) ouyang_long2022RLHF, RL from AI Feedback (RLAIF) bai_yuntao2022RLAIF, and Pure RL guo_daya2025deepseekr1. RLHF and RLAIF share similar pipelines, differing only in how they generate rating responses.
  • Figure 4: An example of an RL application in the Tic-Tac-Toe game. The agent learns optimal strategies through self-play, updating the state-action value function to enhance decision-making over time.
  • Figure 5: An example of CoT. Different from standard prompting, CoT prompting explicitly outlines the reasoning process, leading to more interpretable and accurate results.
  • ...and 7 more figures