Table of Contents
Fetching ...

An Annotation Scheme for Factuality and its Application to Parliamentary Proceedings

Gili Goldin, Shira Wigderson, Ella Rabinovich, Shuly Wintner

TL;DR

The paper presents a comprehensive, language-specific annotation scheme for factuality, designed to identify check-worthy statements in Hebrew parliamentary discourse. It integrates multiple cues—check-worthiness, claim type, factuality profile, ESPs, agency, stance, hedging, quantities, named entities, and time expressions—applied to 4,987 manually annotated Knesset sentences. Through extensive annotation work and an annotator-agreement evaluation, the authors show the scheme's feasibility and highlight challenges in reliability across dense feature layers. They compare GPT-based labeling with fine-tuned Hebrew language models, finding that domain-adapted fine-tuned models (notably Knesset-DictaBERT) substantially outperform GPT baselines and can automatically annotate the entire Knesset Corpus for check-worthiness. The work provides publicly available resources (dataset, prompts, and models) to support Hebrew fact-checking and offers a pathway to adapting the scheme to other morphologically rich languages and domains.

Abstract

Factuality assesses the extent to which a language utterance relates to real-world information; it determines whether utterances correspond to facts, possibilities, or imaginary situations, and as such, it is instrumental for fact checking. Factuality is a complex notion that relies on multiple linguistic signals, and has been studied in various disciplines. We present a complex, multi-faceted annotation scheme of factuality that combines concepts from a variety of previous works. We developed the scheme for Hebrew, but we trust that it can be adapted to other languages. We also present a set of almost 5,000 sentences in the domain of parliamentary discourse that we manually annotated according to this scheme. We report on inter-annotator agreement, and experiment with various approaches to automatically predict (some features of) the scheme, in order to extend the annotation to a large corpus.

An Annotation Scheme for Factuality and its Application to Parliamentary Proceedings

TL;DR

The paper presents a comprehensive, language-specific annotation scheme for factuality, designed to identify check-worthy statements in Hebrew parliamentary discourse. It integrates multiple cues—check-worthiness, claim type, factuality profile, ESPs, agency, stance, hedging, quantities, named entities, and time expressions—applied to 4,987 manually annotated Knesset sentences. Through extensive annotation work and an annotator-agreement evaluation, the authors show the scheme's feasibility and highlight challenges in reliability across dense feature layers. They compare GPT-based labeling with fine-tuned Hebrew language models, finding that domain-adapted fine-tuned models (notably Knesset-DictaBERT) substantially outperform GPT baselines and can automatically annotate the entire Knesset Corpus for check-worthiness. The work provides publicly available resources (dataset, prompts, and models) to support Hebrew fact-checking and offers a pathway to adapting the scheme to other morphologically rich languages and domains.

Abstract

Factuality assesses the extent to which a language utterance relates to real-world information; it determines whether utterances correspond to facts, possibilities, or imaginary situations, and as such, it is instrumental for fact checking. Factuality is a complex notion that relies on multiple linguistic signals, and has been studied in various disciplines. We present a complex, multi-faceted annotation scheme of factuality that combines concepts from a variety of previous works. We developed the scheme for Hebrew, but we trust that it can be adapted to other languages. We also present a set of almost 5,000 sentences in the domain of parliamentary discourse that we manually annotated according to this scheme. We report on inter-annotator agreement, and experiment with various approaches to automatically predict (some features of) the scheme, in order to extend the annotation to a large corpus.

Paper Structure

This paper contains 68 sections, 3 tables.