Understanding 6G through Language Models: A Case Study on LLM-aided Structured Entity Extraction in Telecom Domain
Ye Yuan, Haolun Wu, Hao Zhou, Xue Liu, Hao Chen, Yan Xin, Jianzhong, Zhang
TL;DR
This work addresses the challenge of extracting structured telecom knowledge to support AI-native 6G networks by introducing TeleSEE, a language-model-based framework that uses token-efficient schema-guided tokens and a three-stage hierarchical decoding pipeline. It formalizes structured entity extraction for telecom, develops a novel evaluation metric with multiple matching variants, and provides a new 6GTech dataset for benchmarking. TeleSEE demonstrates higher extraction accuracy and up to an order of magnitude improvement in output speed over baselines, highlighting the benefits of modular, schema-aware generation for complex entity-attribute extraction. The findings suggest TeleSEE’s potential to enable scalable knowledge bases and graphs for telecom systems, with future work on broader knowledge integration and graph-based representations.
Abstract
Knowledge understanding is a foundational part of envisioned 6G networks to advance network intelligence and AI-native network architectures. In this paradigm, information extraction plays a pivotal role in transforming fragmented telecom knowledge into well-structured formats, empowering diverse AI models to better understand network terminologies. This work proposes a novel language model-based information extraction technique, aiming to extract structured entities from the telecom context. The proposed telecom structured entity extraction (TeleSEE) technique applies a token-efficient representation method to predict entity types and attribute keys, aiming to save the number of output tokens and improve prediction accuracy. Meanwhile, TeleSEE involves a hierarchical parallel decoding method, improving the standard encoder-decoder architecture by integrating additional prompting and decoding strategies into entity extraction tasks. In addition, to better evaluate the performance of the proposed technique in the telecom domain, we further designed a dataset named 6GTech, including 2390 sentences and 23747 words from more than 100 6G-related technical publications. Finally, the experiment shows that the proposed TeleSEE method achieves higher accuracy than other baseline techniques, and also presents 5 to 9 times higher sample processing speed.
