Table of Contents
Fetching ...

Explicating the Implicit: Argument Detection Beyond Sentence Boundaries

Paul Roit, Aviv Slobodkin, Eran Hirsch, Arie Cattan, Ayal Klein, Valentina Pyatkin, Ido Dagan

TL;DR

This work reframes document-level semantic argument detection as a textual-entailment task, enabling cross-sentence arguments to be identified without extensive domain-specific supervision. It constructs simple semantic hypotheses from a predicate’s in-sentence arguments and candidate cross-sentence phrases, then tests entailment against the full document to select valid arguments. A predicate-argument aware NLI model trained on QA-SRL data demonstrates strong performance, often surpassing task-specific supervised baselines on document-level benchmarks, with LLM prompts further informing the safety and limits of zero-shot approaches. The method yields schema-free, easily downstream-processed propositions that reveal cross-sentence semantics and can augment SRL and event extraction pipelines, albeit at a computational cost and with dependence on robust entailment models.

Abstract

Detecting semantic arguments of a predicate word has been conventionally modeled as a sentence-level task. The typical reader, however, perfectly interprets predicate-argument relations in a much wider context than just the sentence where the predicate was evoked. In this work, we reformulate the problem of argument detection through textual entailment to capture semantic relations across sentence boundaries. We propose a method that tests whether some semantic relation can be inferred from a full passage by first encoding it into a simple and standalone proposition and then testing for entailment against the passage. Our method does not require direct supervision, which is generally absent due to dataset scarcity, but instead builds on existing NLI and sentence-level SRL resources. Such a method can potentially explicate pragmatically understood relations into a set of explicit sentences. We demonstrate it on a recent document-level benchmark, outperforming some supervised methods and contemporary language models.

Explicating the Implicit: Argument Detection Beyond Sentence Boundaries

TL;DR

This work reframes document-level semantic argument detection as a textual-entailment task, enabling cross-sentence arguments to be identified without extensive domain-specific supervision. It constructs simple semantic hypotheses from a predicate’s in-sentence arguments and candidate cross-sentence phrases, then tests entailment against the full document to select valid arguments. A predicate-argument aware NLI model trained on QA-SRL data demonstrates strong performance, often surpassing task-specific supervised baselines on document-level benchmarks, with LLM prompts further informing the safety and limits of zero-shot approaches. The method yields schema-free, easily downstream-processed propositions that reveal cross-sentence semantics and can augment SRL and event extraction pipelines, albeit at a computational cost and with dependence on robust entailment models.

Abstract

Detecting semantic arguments of a predicate word has been conventionally modeled as a sentence-level task. The typical reader, however, perfectly interprets predicate-argument relations in a much wider context than just the sentence where the predicate was evoked. In this work, we reformulate the problem of argument detection through textual entailment to capture semantic relations across sentence boundaries. We propose a method that tests whether some semantic relation can be inferred from a full passage by first encoding it into a simple and standalone proposition and then testing for entailment against the passage. Our method does not require direct supervision, which is generally absent due to dataset scarcity, but instead builds on existing NLI and sentence-level SRL resources. Such a method can potentially explicate pragmatically understood relations into a set of explicit sentences. We demonstrate it on a recent document-level benchmark, outperforming some supervised methods and contemporary language models.
Paper Structure (26 sections, 3 equations, 5 figures, 4 tables)

This paper contains 26 sections, 3 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Example of semantic arguments in the sentence and document scope. The predicate is in boldface while arguments are highlighted in color. The bottom part shows four different propositions: (1) A proposition constructed from in-sentence arguments of the predicate. (2) The same proposition with additional arguments from other sentences in the document. (3) A proposition with some arguments (the house) placed in an incorrect syntactic position that does not align with its original semantic role. (4) A proposition with an incorrect argument according to the document. Both (3) and (4) are not supported by the document.
  • Figure 2: Left: An excerpt from the document with the predicate marked in bold in the circled sentence. Candidate phrases from across the document are highlighted in gray. For completeness, we also highlight local arguments and their co-referent mentions in color. Center-Top: A QA-SRL parser analyzes the predicate's sentence and outputs local arguments, their questions, and their syntactic position (in purple). Center-Bottom: Hypothesis fields are assigned with argument and candidate phrases to different syntactic positions. Grammatical attributes are extracted from the question of the first local argument. The generated hypothesis sentence is shown in the box below. Right: Each candidate (highlighted in gray) is inserted into three different position fields and the resulting hypothesis is verified with an NLI model against the full document. The second candidate demonstrates two correct alternations.
  • Figure 3: Our implicit arguments annotation interface. The yellow highlighted phrases depicts the current set of arguments, phrases in grey are candidates that need to be either removed from the TODO list or selected as an answer to a QA-SRL question. The interface validates that the question is formatted correctly.
  • Figure 4: Stratification of our results on the test-set of TNE according to distance counted in sentence between the entity and the predicate. The distance of an entity from the predicate is defined as the absolute difference between the sentence index of the closest mention to the predicate and the sentence index of the predicate.
  • Figure 5: The Mistral-specific prompts are formatted both as QA generation (top) and argument extraction (bottom). Blue highlighting indicates chat instructions, green is our task-specific instruction, orange is for the query, and yellow is our example of a suitable response.