Table of Contents
Fetching ...

Semantic Information Extraction for Text Data with Probability Graph

Zhouxiang Zhao, Zhaohui Yang, Ye Hu, Licheng Lin, Zhaoyang Zhang

TL;DR

This work tackles semantic information extraction for text transmission over resource-constrained networks by encoding semantic content into a probability-augmented knowledge graph and selecting the most important semantic information via a Floyd-based shortest-path framework and sorting. It introduces semantic uncertainty and semantic similarity as metrics to evaluate extraction quality and formulates an optimization that minimizes entropy of the chosen semantic triples under depth and compression constraints. The key contributions are a probabilistic knowledge graph representation, a constraint-driven extraction algorithm using $K=H/G$ and depth $D$, and empirical evidence showing improved SU and SS relative to baselines. The approach enables efficient semantic transmission with quantified clarity and fidelity, suitable for low-bandwidth scenarios like satellite or underwater links, and points to future joint optimization of communication and computation resources.

Abstract

In this paper, the problem of semantic information extraction for resource constrained text data transmission is studied. In the considered model, a sequence of text data need to be transmitted within a communication resource-constrained network, which only allows limited data transmission. Thus, at the transmitter, the original text data is extracted with natural language processing techniques. Then, the extracted semantic information is captured in a knowledge graph. An additional probability dimension is introduced in this graph to capture the importance of each information. This semantic information extraction problem is posed as an optimization framework whose goal is to extract most important semantic information for transmission. To find an optimal solution for this problem, a Floyd's algorithm based solution coupled with an efficient sorting mechanism is proposed. Numerical results testify the effectiveness of the proposed algorithm with regards to two novel performance metrics including semantic uncertainty and semantic similarity.

Semantic Information Extraction for Text Data with Probability Graph

TL;DR

This work tackles semantic information extraction for text transmission over resource-constrained networks by encoding semantic content into a probability-augmented knowledge graph and selecting the most important semantic information via a Floyd-based shortest-path framework and sorting. It introduces semantic uncertainty and semantic similarity as metrics to evaluate extraction quality and formulates an optimization that minimizes entropy of the chosen semantic triples under depth and compression constraints. The key contributions are a probabilistic knowledge graph representation, a constraint-driven extraction algorithm using and depth , and empirical evidence showing improved SU and SS relative to baselines. The approach enables efficient semantic transmission with quantified clarity and fidelity, suitable for low-bandwidth scenarios like satellite or underwater links, and points to future joint optimization of communication and computation resources.

Abstract

In this paper, the problem of semantic information extraction for resource constrained text data transmission is studied. In the considered model, a sequence of text data need to be transmitted within a communication resource-constrained network, which only allows limited data transmission. Thus, at the transmitter, the original text data is extracted with natural language processing techniques. Then, the extracted semantic information is captured in a knowledge graph. An additional probability dimension is introduced in this graph to capture the importance of each information. This semantic information extraction problem is posed as an optimization framework whose goal is to extract most important semantic information for transmission. To find an optimal solution for this problem, a Floyd's algorithm based solution coupled with an efficient sorting mechanism is proposed. Numerical results testify the effectiveness of the proposed algorithm with regards to two novel performance metrics including semantic uncertainty and semantic similarity.
Paper Structure (8 sections, 10 equations, 8 figures)

This paper contains 8 sections, 10 equations, 8 figures.

Figures (8)

  • Figure 1: An example of knowledge graph.
  • Figure 2: Each edge between two entities consists of several relation probabilities.
  • Figure 3: An example of central concept and relational distance.
  • Figure 4: An example of an original text and the extracted semantic information about "Bruce Lee".
  • Figure 5: Original quadruples obtained from the text data about "Bruce Lee", and the selected quadruples using our algorithm when $K=0.5$ and $D=2$. Note that the selected quadruples are marked in green.
  • ...and 3 more figures