Table of Contents
Fetching ...

Exploiting Duality in Open Information Extraction with Predicate Prompt

Zhen Chen, Jingping Liu, Deqing Yang, Yanghua Xiao, Huimin Xu, Zongyu Wang, Rui Xie, Yunsen Xian

TL;DR

This work tackles Open Information Extraction by addressing complex triplets through a novel DualOIE model that couples a primary extraction task with an auxiliary dual task of converting triplets back into the source sentence. The approach introduces a predicate-prompt mechanism to generate all triplets in a single decoding pass, while sharing an encoder across both directions to encourage structural understanding. The authors also contribute MTOIE, a large-scale Meituan-derived dataset with substantial implicit triplets, enabling more realistic evaluation. Empirical results on CaRB, SAOKE, and MTOIE show DualOIE outperforms state-of-the-art baselines, with online A/B testing on Meituan indicating meaningful improvements in QV-CTR and UV-CTR. The work thus advances OpenIE by leveraging dual-task learning and explicit predicate prompts to robustly extract implicit, overlapping, and discontinuous triplets in real-world text.

Abstract

Open information extraction (OpenIE) aims to extract the schema-free triplets in the form of (\emph{subject}, \emph{predicate}, \emph{object}) from a given sentence. Compared with general information extraction (IE), OpenIE poses more challenges for the IE models, {especially when multiple complicated triplets exist in a sentence. To extract these complicated triplets more effectively, in this paper we propose a novel generative OpenIE model, namely \emph{DualOIE}, which achieves a dual task at the same time as extracting some triplets from the sentence, i.e., converting the triplets into the sentence.} Such dual task encourages the model to correctly recognize the structure of the given sentence and thus is helpful to extract all potential triplets from the sentence. Specifically, DualOIE extracts the triplets in two steps: 1) first extracting a sequence of all potential predicates, 2) then using the predicate sequence as a prompt to induce the generation of triplets. Our experiments on two benchmarks and our dataset constructed from Meituan demonstrate that DualOIE achieves the best performance among the state-of-the-art baselines. Furthermore, the online A/B test on Meituan platform shows that 0.93\% improvement of QV-CTR and 0.56\% improvement of UV-CTR have been obtained when the triplets extracted by DualOIE were leveraged in Meituan's search system.

Exploiting Duality in Open Information Extraction with Predicate Prompt

TL;DR

This work tackles Open Information Extraction by addressing complex triplets through a novel DualOIE model that couples a primary extraction task with an auxiliary dual task of converting triplets back into the source sentence. The approach introduces a predicate-prompt mechanism to generate all triplets in a single decoding pass, while sharing an encoder across both directions to encourage structural understanding. The authors also contribute MTOIE, a large-scale Meituan-derived dataset with substantial implicit triplets, enabling more realistic evaluation. Empirical results on CaRB, SAOKE, and MTOIE show DualOIE outperforms state-of-the-art baselines, with online A/B testing on Meituan indicating meaningful improvements in QV-CTR and UV-CTR. The work thus advances OpenIE by leveraging dual-task learning and explicit predicate prompts to robustly extract implicit, overlapping, and discontinuous triplets in real-world text.

Abstract

Open information extraction (OpenIE) aims to extract the schema-free triplets in the form of (\emph{subject}, \emph{predicate}, \emph{object}) from a given sentence. Compared with general information extraction (IE), OpenIE poses more challenges for the IE models, {especially when multiple complicated triplets exist in a sentence. To extract these complicated triplets more effectively, in this paper we propose a novel generative OpenIE model, namely \emph{DualOIE}, which achieves a dual task at the same time as extracting some triplets from the sentence, i.e., converting the triplets into the sentence.} Such dual task encourages the model to correctly recognize the structure of the given sentence and thus is helpful to extract all potential triplets from the sentence. Specifically, DualOIE extracts the triplets in two steps: 1) first extracting a sequence of all potential predicates, 2) then using the predicate sequence as a prompt to induce the generation of triplets. Our experiments on two benchmarks and our dataset constructed from Meituan demonstrate that DualOIE achieves the best performance among the state-of-the-art baselines. Furthermore, the online A/B test on Meituan platform shows that 0.93\% improvement of QV-CTR and 0.56\% improvement of UV-CTR have been obtained when the triplets extracted by DualOIE were leveraged in Meituan's search system.
Paper Structure (39 sections, 15 equations, 6 figures, 9 tables, 1 algorithm)

This paper contains 39 sections, 15 equations, 6 figures, 9 tables, 1 algorithm.

Figures (6)

  • Figure 1: The overall framework of our proposed DualOIE, including the structure of achieving two tasks of opposite directions, $\textbf{S} \rightarrow \textbf{T}$ and $\textbf{T}\rightarrow \textbf{S}$.
  • Figure 2: Performance on extraction of complicated triplets in SAOKE.
  • Figure 3: The prompts we designed to instruct ChatGPT to perform the OpenIE task.
  • Figure 4: Performance comparisons on the groups with different triplet numbers ($m$) of a sentence in SAOKE.
  • Figure 5: Correlation analysis between the BLEU of T to S (X-axis) and the F1 of S to T (Y-axis).
  • ...and 1 more figures