KIEval: Evaluation Metric for Document Key Information Extraction
Minsoo Khang, Sang Chul Jung, Sungrae Park, Teakgyu Hong
TL;DR
This work addresses the mismatch between industrial needs for Document KIE and existing evaluation metrics that ignore grouping and correction costs. It introduces KIEval, a two-level evaluation framework that uses group matching and two metrics: KIEval Entity F1 and KIEval Group F1, plus KIEval_Aligned, which expresses errors as substitution/addition/deletion costs and aligns with real-world correction costs ($\text{KIEval}_{\text{Aligned}} = \frac{TP^{\text{entity}}}{TP^{\text{entity}} + \text{Error}}$). The approach is validated on SROIE, CORD, and FUNSD with diverse model families (LayoutXLM, LayoutLMv3, Donut) and even zero-shot LLMs, demonstrating structure-aware evaluation and practical trade-offs in RPA workflows. It also demonstrates how KIEval supports automation-rate versus accuracy decisions via threshold-based post-processing in RPA contexts.
Abstract
Document Key Information Extraction (KIE) is a technology that transforms valuable information in document images into structured data, and it has become an essential function in industrial settings. However, current evaluation metrics of this technology do not accurately reflect the critical attributes of its industrial applications. In this paper, we present KIEval, a novel application-centric evaluation metric for Document KIE models. Unlike prior metrics, KIEval assesses Document KIE models not just on the extraction of individual information (entity) but also of the structured information (grouping). Evaluation of structured information provides assessment of Document KIE models that are more reflective of extracting grouped information from documents in industrial settings. Designed with industrial application in mind, we believe that KIEval can become a standard evaluation metric for developing or applying Document KIE models in practice. The code will be publicly available.
