Table of Contents
Fetching ...

LLM-based Atomic Propositions help weak extractors: Evaluation of a Propositioner for triplet extraction

Luc Pommeret, Thomas Gerald, Patrick Paroubek, Sahar Ghannay, Christophe Servan, Sophie Rosset

Abstract

Knowledge Graph construction from natural language requires extracting structured triplets from complex, information-dense sentences. In this paper, we investigate if the decomposition of text into atomic propositions (minimal, semantically autonomous units of information) can improve the triplet extraction. We introduce MPropositionneur-V2, a small multilingual model covering six European languages trained by knowledge distillation from Qwen3-32B into a Qwen3-0.6B architecture, and we evaluate its integration into two extraction paradigms: entity-centric (GLiREL) and generative (Qwen3). Experiments on SMiLER, FewRel, DocRED and CaRB show that atomic propositions benefit weaker extractors (GLiREL, CoreNLP, 0.6B models), improving relation recall and, in the multilingual setting, overall accuracy. For stronger LLMs, a fallback combination strategy recovers entity recall losses while preserving the gains in relation extraction. These results show that atomic propositions are an interpretable intermediate data structure that complements extractors without replacing them.

LLM-based Atomic Propositions help weak extractors: Evaluation of a Propositioner for triplet extraction

Abstract

Knowledge Graph construction from natural language requires extracting structured triplets from complex, information-dense sentences. In this paper, we investigate if the decomposition of text into atomic propositions (minimal, semantically autonomous units of information) can improve the triplet extraction. We introduce MPropositionneur-V2, a small multilingual model covering six European languages trained by knowledge distillation from Qwen3-32B into a Qwen3-0.6B architecture, and we evaluate its integration into two extraction paradigms: entity-centric (GLiREL) and generative (Qwen3). Experiments on SMiLER, FewRel, DocRED and CaRB show that atomic propositions benefit weaker extractors (GLiREL, CoreNLP, 0.6B models), improving relation recall and, in the multilingual setting, overall accuracy. For stronger LLMs, a fallback combination strategy recovers entity recall losses while preserving the gains in relation extraction. These results show that atomic propositions are an interpretable intermediate data structure that complements extractors without replacing them.

Paper Structure

This paper contains 28 sections, 3 theorems, 4 equations, 5 figures, 3 tables, 1 algorithm.

Key Result

Lemma 1

Let $\phi = A \land B$ where $A$ and $B$ are logically independent. Extracting the component $A$ is a strictly safe operation. $\blacktriangleleft$$\blacktriangleleft$

Figures (5)

  • Figure 1: The upper schema depicts the pipeline's stages: at stage 1, we extract atomic propositions from the source text; at stage 2, we extract triplets using either a dependency parser or generative LLMs; and finally, we build the knowledge graph from the entities and relations retrieved. For evaluation, we limit to the triplet entity-relation benchmark.
  • Figure 2: Prompt Template used for the distillation of the propositioner used in stage 1.
  • Figure 3: Prompt Template used for the stage 2.
  • Figure 4: Example of a sentence input processed through the whole pipeline, the atomic output, and the triplets extracted.
  • Figure 5: The Knowledge Graph built with triplets extracted by GLiREL on the atomic propositions for the above sentence.

Theorems & Definitions (8)

  • Definition 1: Safe Cut
  • Definition 2: Bad cut
  • Lemma 1: Divisibility of Conjunction — Lemma
  • proof
  • Lemma 2: Indivisibility of Disjunction — Lemma
  • proof
  • Theorem 1: Structural characterisation — Theorem
  • proof