factgenie: A Framework for Span-based Evaluation of Generated Texts

Zdeněk Kasner; Ondřej Plátek; Patrícia Schmidtová; Simone Balloccu; Ondřej Dušek

factgenie: A Framework for Span-based Evaluation of Generated Texts

Zdeněk Kasner, Ondřej Plátek, Patrícia Schmidtová, Simone Balloccu, Ondřej Dušek

TL;DR

factgenie addresses the need for fine-grained, span-based evaluation of generated text by enabling simultaneous collection and visualization of word-level annotations from humans and LLMs. It delivers a lightweight, self-hosted framework with a Flask-based backend, a Bootstrap/jQuery frontend, and plug-ins for span annotations (YPet) and interactive visualizations, plus ready-made data loaders and LLM API wrappers. The main contributions are a modular Dataset abstraction, end-to-end annotation campaigns for both crowdsourcing and LLMs, and an extensible visualization pipeline, all designed for rapid prototyping. The framework supports precise, transparent error analysis and can streamline evaluation campaigns for researchers and practitioners.

Abstract

We present factgenie: a framework for annotating and visualizing word spans in textual model outputs. Annotations can capture various span-based phenomena such as semantic inaccuracies or irrelevant text. With factgenie, the annotations can be collected both from human crowdworkers and large language models. Our framework consists of a web interface for data visualization and gathering text annotations, powered by an easily extensible codebase.

factgenie: A Framework for Span-based Evaluation of Generated Texts

TL;DR

Abstract

Paper Structure (5 sections, 2 figures)

This paper contains 5 sections, 2 figures.

Introduction
Framework
Human Annotations
LLM Annotations
Roadmap

Figures (2)

Figure 1: Elements from the factgenie user interface: (a) custom visualization of the input data, (b) the corresponding LLM output with span annotations. The highlight colors correspond to custom annotation categories defined for the annotation process ( = incorrect fact, = fact not checkable, = misleading fact).
Figure 2: factgenie workflow. Actions needed for using factgenie for custom tasks are shown in blue rectangles.

factgenie: A Framework for Span-based Evaluation of Generated Texts

TL;DR

Abstract

factgenie: A Framework for Span-based Evaluation of Generated Texts

Authors

TL;DR

Abstract

Table of Contents

Figures (2)