A Mathematical Framework of Semantic Communication based on Category Theory
Shuheng Hua, Yao Sun, Kairong Ma, Dusit Niyato, Muhammad Ali Imran
TL;DR
This work introduces a category-theoretic mathematical framework for semantic communication (SemCom) that models semantic entities within a semantic probability space and defines semantic entropy $H_s(E)$. By proving $H_s(E)$ can be reduced via knowledge bases (KBs) that capture semantic dependencies, the paper derives a KB-enhanced semantic channel capacity $C_s$ and demonstrates entropy reduction at both entity and message levels. The framework is validated numerically through KB-aided semantic coding (based on Fano coding), showing improved coding efficiency and lower entropy than traditional approaches, particularly for distributions with strong semantic dependencies (e.g., Zipf). Overall, the approach provides a rigorous foundation for designing and analyzing SemCom systems with KB integration and offers guidance for optimizing KB depth and semantic representations in practical wireless settings.
Abstract
While semantic communication (SemCom) has recently demonstrated great potential to enhance transmission efficiency and reliability by leveraging machine learning (ML) and knowledge base (KB), there is a lack of mathematical modeling to rigorously characterize SemCom system and quantify the performance gain obtained from ML and KB. In this paper, we develop a mathematical framework for SemCom based on category theory, rigorously modeling the concepts of semantic entities and semantic probability space. Within this framework, we introduce the semantic entropy to quantify the uncertainty of semantic entities. We theoretically prove that semantic entropy can be effectively reduced by exploiting KBs, which capture semantic dependencies. Within the formulated semantic space, semantic entities can be combined according to the required semantic ambiguity, and the combined entities can be encoded based on semantic dependencies obtained from KB. Then, we derive semantic channel capacity modeling, which incorporates the mutual information obtained in KB to accurately measure the transmission efficiency of SemCom. Numerical simulations validate the effectiveness of the proposed framework, showing that SemCom with KB integration outperforms traditional communication in both entropy reduction and coding efficiency.
