Semantic Information Extraction for Text Data with Probability Graph
Zhouxiang Zhao, Zhaohui Yang, Ye Hu, Licheng Lin, Zhaoyang Zhang
TL;DR
This work tackles semantic information extraction for text transmission over resource-constrained networks by encoding semantic content into a probability-augmented knowledge graph and selecting the most important semantic information via a Floyd-based shortest-path framework and sorting. It introduces semantic uncertainty and semantic similarity as metrics to evaluate extraction quality and formulates an optimization that minimizes entropy of the chosen semantic triples under depth and compression constraints. The key contributions are a probabilistic knowledge graph representation, a constraint-driven extraction algorithm using $K=H/G$ and depth $D$, and empirical evidence showing improved SU and SS relative to baselines. The approach enables efficient semantic transmission with quantified clarity and fidelity, suitable for low-bandwidth scenarios like satellite or underwater links, and points to future joint optimization of communication and computation resources.
Abstract
In this paper, the problem of semantic information extraction for resource constrained text data transmission is studied. In the considered model, a sequence of text data need to be transmitted within a communication resource-constrained network, which only allows limited data transmission. Thus, at the transmitter, the original text data is extracted with natural language processing techniques. Then, the extracted semantic information is captured in a knowledge graph. An additional probability dimension is introduced in this graph to capture the importance of each information. This semantic information extraction problem is posed as an optimization framework whose goal is to extract most important semantic information for transmission. To find an optimal solution for this problem, a Floyd's algorithm based solution coupled with an efficient sorting mechanism is proposed. Numerical results testify the effectiveness of the proposed algorithm with regards to two novel performance metrics including semantic uncertainty and semantic similarity.
