Table of Contents
Fetching ...

Large Model Enabled Embodied Intelligence for 6G Integrated Perception, Communication, and Computation Network

Zhuoran Li, Zhen Gao, Xinhua Liu, Zheng Wang, Xiaotian Zhou, Lei Liu, Yongpeng Wu, Wei Feng, Yongming Huang

TL;DR

This work proposes large-model-enabled embodied intelligent base station agents (IBSAs) for 6G, embedding perception, cognition, and action at the base station level to create a closed-loop perception-communication-computation system. It introduces a three-layer IBSA architecture supported by cloud-edge-end collaboration and digital twins, demonstrated in two safety-critical scenarios: cooperative autonomous driving and low-altitude UAV safety. The paper outlines enabling technologies (cognitive cores, edge deployment, privacy/security, and digital twins) and a holistic evaluation framework with the IBSA-Score to quantify performance across perception, network, and agent dimensions. It also discusses generalization to Industrial IoT and Smart City contexts, along with challenges in benchmarks, continual adaptation, standardization, and trustworthy AI for practical deployment.

Abstract

The advent of sixth-generation (6G) places intelligence at the core of wireless architecture, fusing perception, communication, and computation into a single closed-loop. This paper argues that large artificial intelligence models (LAMs) can endow base stations with perception, reasoning, and acting capabilities, thus transforming them into intelligent base station agents (IBSAs). We first review the historical evolution of BSs from single-functional analog infrastructure to distributed, software-defined, and finally LAM-empowered IBSA, highlighting the accompanying changes in architecture, hardware platforms, and deployment. We then present an IBSA architecture that couples a perception-cognition-execution pipeline with cloud-edge-end collaboration and parameter-efficient adaptation. Subsequently,we study two representative scenarios: (i) cooperative vehicle-road perception for autonomous driving, and (ii) ubiquitous base station support for low-altitude uncrewed aerial vehicle safety monitoring and response against unauthorized drones. On this basis, we analyze key enabling technologies spanning LAM design and training, efficient edge-cloud inference, multi-modal perception and actuation, as well as trustworthy security and governance. We further propose a holistic evaluation framework and benchmark considerations that jointly cover communication performance, perception accuracy, decision-making reliability, safety, and energy efficiency. Finally, we distill open challenges on benchmarks, continual adaptation, trustworthy decision-making, and standardization. Together, this work positions LAM-enabled IBSAs as a practical path toward integrated perception, communication, and computation native, safety-critical 6G systems.

Large Model Enabled Embodied Intelligence for 6G Integrated Perception, Communication, and Computation Network

TL;DR

This work proposes large-model-enabled embodied intelligent base station agents (IBSAs) for 6G, embedding perception, cognition, and action at the base station level to create a closed-loop perception-communication-computation system. It introduces a three-layer IBSA architecture supported by cloud-edge-end collaboration and digital twins, demonstrated in two safety-critical scenarios: cooperative autonomous driving and low-altitude UAV safety. The paper outlines enabling technologies (cognitive cores, edge deployment, privacy/security, and digital twins) and a holistic evaluation framework with the IBSA-Score to quantify performance across perception, network, and agent dimensions. It also discusses generalization to Industrial IoT and Smart City contexts, along with challenges in benchmarks, continual adaptation, standardization, and trustworthy AI for practical deployment.

Abstract

The advent of sixth-generation (6G) places intelligence at the core of wireless architecture, fusing perception, communication, and computation into a single closed-loop. This paper argues that large artificial intelligence models (LAMs) can endow base stations with perception, reasoning, and acting capabilities, thus transforming them into intelligent base station agents (IBSAs). We first review the historical evolution of BSs from single-functional analog infrastructure to distributed, software-defined, and finally LAM-empowered IBSA, highlighting the accompanying changes in architecture, hardware platforms, and deployment. We then present an IBSA architecture that couples a perception-cognition-execution pipeline with cloud-edge-end collaboration and parameter-efficient adaptation. Subsequently,we study two representative scenarios: (i) cooperative vehicle-road perception for autonomous driving, and (ii) ubiquitous base station support for low-altitude uncrewed aerial vehicle safety monitoring and response against unauthorized drones. On this basis, we analyze key enabling technologies spanning LAM design and training, efficient edge-cloud inference, multi-modal perception and actuation, as well as trustworthy security and governance. We further propose a holistic evaluation framework and benchmark considerations that jointly cover communication performance, perception accuracy, decision-making reliability, safety, and energy efficiency. Finally, we distill open challenges on benchmarks, continual adaptation, trustworthy decision-making, and standardization. Together, this work positions LAM-enabled IBSAs as a practical path toward integrated perception, communication, and computation native, safety-critical 6G systems.

Paper Structure

This paper contains 52 sections, 5 figures, 8 tables.

Figures (5)

  • Figure 1: The physical evolution of the BS from 1G to 6G and beyond. The architecture has progressed from (Stage 1-2) centralized cabinets (analog BS, digital BTS) with high-loss coaxial feeders to (Stage 3) a distributed structure with a baseband processing unit (BBU) and remote radio unit (RRU) connected by optical fiber. This was followed by (Stage 4-5) the integration of RF and antenna into an active antenna unit (AAU) to support MIMO and massive MIMO. This integration path provides the foundation for (Stage 6) the 6G ISAC BS and (Stage 7) the LAM-enabled IBSA, which leverages multi-modal sensors like lidar and cameras for embodied intelligence.
  • Figure 2: Architecture of embodied intelligent BS agent. The framework is centered around an LAM that empowers the embodied IBSA through a three-layer architecture, consisting of a perception layer, a cognition layer, and an execution layer. At the top, various simulation platforms are used to synthesize and replay scene data for closed-loop evaluation. The agent processes multi-modality data inputs, such as infrared, visual images, wireless signals, and lidar point clouds. This enables the embodied IBSA to perform various downstream tasks, which include prediction, beamforming, resource allocation, and network cooperation. The entire architecture is supported by the foundational components of data, compute resources, and algorithms.
  • Figure 3: Cooperative vehicle-road multi-source fusion perception empowering autonomous driving. This figure illustrates the complete closed-loop workflow of multi-source fusion perception across the scenario, data, and network layers. The entities on the left, "Vehicle, IBSA, Camera, and Cloud Server", delineate the primary roles in the system. Within the road scenario, the colored arrows represent data flows: blue arrows denote the uplink of environmental information from the embodied IBSA to the cloud; red arrows signify the acquisition and transmission of visual data from cameras; and green arrows indicate robust beamforming and high-fidelity communication from the embodied IBSA to vehicles, guided by perception results and cloud directives. The workflow on the right begins with the collection of "Multi-Modality Data", including RF signals, visual images, and lidar point clouds. This data then proceeds through a vertical pipeline of "Spatiotemporal Alignment", "Edge Inference", and "Network Cooperation" to achieve perception-communication-computation co-optimization for autonomous driving tasks. The overall diagram highlights the embodied IBSA's pivotal role as a perception-communication bridge within this closed-loop system.
  • Figure 4: Ubiquitous embodied IBSA empower low-altitude safety. This figure illustrates the closed-loop process for low-altitude security enabled by the IBSA, detailing entity roles, information flows, and the decision chain. In the central scenario, green arrows represent the continuous monitoring and data uplink for legitimate UAVs, while red arrows signify the early warning, targeting, and eventual jamming of unauthorized or suspicious UAVs entering a restricted area. The vertical arrows indicate data uploads from embodied IBSAs and cameras to the edge/cloud server, and the white bidirectional arrows depict the crucial role of "Network Cooperation" among multiple embodied IBSAs and cloud-edge nodes for tasks like collaborative localization and resource scheduling. The workflow on the right outlines the full decision chain, progressing from "Detection and Monitoring", through "Evaluation and Decision", to responsive actions like "Jamming" or issuing control commands to neutralize threats. This schematic underscores the embodied IBSA's central function in an integrated system that provides perception, assessment, and interference capabilities for low-altitude airspace management.
  • Figure 5: Performance comparison of different fusion strategies across modalities (lidar, camera, RF). Data is derived from yang2024v2x. The significant gap between "No Fusion" and cooperative methods (e.g., HEAL lu2024heal, CoAlign lu2023coalign, F-Cooper chen2019fcooper, V2X-ViT xu2022v2xvit) demonstrates the necessity of network-level collaboration for the IBSA. AP: average precision; Sync: synchronous; Aync: asynchronous; AM:average transmission cost in mega byte).