factgenie: A Framework for Span-based Evaluation of Generated Texts
Zdeněk Kasner, Ondřej Plátek, Patrícia Schmidtová, Simone Balloccu, Ondřej Dušek
TL;DR
factgenie addresses the need for fine-grained, span-based evaluation of generated text by enabling simultaneous collection and visualization of word-level annotations from humans and LLMs. It delivers a lightweight, self-hosted framework with a Flask-based backend, a Bootstrap/jQuery frontend, and plug-ins for span annotations (YPet) and interactive visualizations, plus ready-made data loaders and LLM API wrappers. The main contributions are a modular Dataset abstraction, end-to-end annotation campaigns for both crowdsourcing and LLMs, and an extensible visualization pipeline, all designed for rapid prototyping. The framework supports precise, transparent error analysis and can streamline evaluation campaigns for researchers and practitioners.
Abstract
We present factgenie: a framework for annotating and visualizing word spans in textual model outputs. Annotations can capture various span-based phenomena such as semantic inaccuracies or irrelevant text. With factgenie, the annotations can be collected both from human crowdworkers and large language models. Our framework consists of a web interface for data visualization and gathering text annotations, powered by an easily extensible codebase.
