Table of Contents
Fetching ...

FinCriticalED: A Visual Benchmark for Financial Fact-Level OCR Evaluation

Yueru He, Xueqing Peng, Yupeng Cao, Yan Wang, Lingfei Qian, Haohang Li, Yi Han, Ruoyu Xiang, Mingquan Lin, Prayag Tiwari, Jimin Huang, Guojun Xiong, Sophia Ananiadou

TL;DR

FinCriticalED introduces the first fact-level visual benchmark for financial OCR, addressing the critical need to preserve numerical and temporal facts in dense financial documents. It combines a 500-document, 739-fact annotated dataset with ground-truth HTML and an LLM-as-Judge evaluation pipeline to quantify factual correctness, evaluated across state-of-the-art OCR, open-source vision–language models, and proprietary systems. The study reveals that traditional lexical metrics fail to capture financial fidelity, with temporal facts generally more robust than numerical ones, and proprietary models achieving the highest factual accuracy while open models rapidly close the gap. Together, these contributions establish a rigorous, domain-focused foundation for assessing and advancing factual reliability in financial OCR and related precision-critical domains.

Abstract

We introduce FinCriticalED (Financial Critical Error Detection), a visual benchmark for evaluating OCR and vision language models on financial documents at the fact level. Financial documents contain visually dense and table heavy layouts where numerical and temporal information is tightly coupled with structure. In high stakes settings, small OCR mistakes such as sign inversion or shifted dates can lead to materially different interpretations, while traditional OCR metrics like ROUGE and edit distance capture only surface level text similarity. \ficriticaled provides 500 image-HTML pairs with expert annotated financial facts covering over seven hundred numerical and temporal facts. It introduces three key contributions. First, it establishes the first fact level evaluation benchmark for financial document understanding, shifting evaluation from lexical overlap to domain critical factual correctness. Second, all annotations are created and verified by financial experts with strict quality control over signs, magnitudes, and temporal expressions. Third, we develop an LLM-as-Judge evaluation pipeline that performs structured fact extraction and contextual verification for visually complex financial documents. We benchmark OCR systems, open source vision language models, and proprietary models on FinCriticalED. Results show that although the strongest proprietary models achieve the highest factual accuracy, substantial errors remain in visually intricate numerical and temporal contexts. Through quantitative evaluation and expert case studies, FinCriticalED provides a rigorous foundation for advancing visual factual precision in financial and other precision critical domains.

FinCriticalED: A Visual Benchmark for Financial Fact-Level OCR Evaluation

TL;DR

FinCriticalED introduces the first fact-level visual benchmark for financial OCR, addressing the critical need to preserve numerical and temporal facts in dense financial documents. It combines a 500-document, 739-fact annotated dataset with ground-truth HTML and an LLM-as-Judge evaluation pipeline to quantify factual correctness, evaluated across state-of-the-art OCR, open-source vision–language models, and proprietary systems. The study reveals that traditional lexical metrics fail to capture financial fidelity, with temporal facts generally more robust than numerical ones, and proprietary models achieving the highest factual accuracy while open models rapidly close the gap. Together, these contributions establish a rigorous, domain-focused foundation for assessing and advancing factual reliability in financial OCR and related precision-critical domains.

Abstract

We introduce FinCriticalED (Financial Critical Error Detection), a visual benchmark for evaluating OCR and vision language models on financial documents at the fact level. Financial documents contain visually dense and table heavy layouts where numerical and temporal information is tightly coupled with structure. In high stakes settings, small OCR mistakes such as sign inversion or shifted dates can lead to materially different interpretations, while traditional OCR metrics like ROUGE and edit distance capture only surface level text similarity. \ficriticaled provides 500 image-HTML pairs with expert annotated financial facts covering over seven hundred numerical and temporal facts. It introduces three key contributions. First, it establishes the first fact level evaluation benchmark for financial document understanding, shifting evaluation from lexical overlap to domain critical factual correctness. Second, all annotations are created and verified by financial experts with strict quality control over signs, magnitudes, and temporal expressions. Third, we develop an LLM-as-Judge evaluation pipeline that performs structured fact extraction and contextual verification for visually complex financial documents. We benchmark OCR systems, open source vision language models, and proprietary models on FinCriticalED. Results show that although the strongest proprietary models achieve the highest factual accuracy, substantial errors remain in visually intricate numerical and temporal contexts. Through quantitative evaluation and expert case studies, FinCriticalED provides a rigorous foundation for advancing visual factual precision in financial and other precision critical domains.

Paper Structure

This paper contains 54 sections, 11 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Illustration of financial fact extraction in FinCriticalED50,180,1400,0,185. Both Numerical Facts (e.g., yields, interest rates, minimum investment amounts) and Temporal Facts (e.g., adjustment periods) appear within the tabular sections of real SEC filings, requiring models to accurately identify and link fact values across table rows and columns.
  • Figure 2: Challenges in financial OCR and the FinCriticalED50,180,1400,0,185 solution pipeline. Left: Unlike general OCR with sparse, unimodal text and simple layouts, financial documents contain dense tables, hierarchical structures, and semantically sensitive numeric values that require multimodal alignment and layout-aware reasoning. Right: FinCriticalED50,180,1400,0,185 addresses these challenges by combining page-level images, rendered and preprocessed HTML, expert-annotated financial entities, and an LLM-as-Judge evaluation framework to assess factual OCR reliability.
  • Figure 3: Comparison of OCR and multimodal models on general OCR metrics and financial fact-level accuracy. Best results per metric are highlighted with a star ($\star$). (R1 = ROUGE-1; RL = ROUGE-L; E = Edit Distance$\downarrow$; N-FFA = numerical fact accuracy; T-FFA = temporal fact accuracy; FFA = overall financial fact accuracy).
  • Figure 4: Annotation interface used in FinCriticalED50,180,1400,0,185. Annotators highlight entities directly within HTML while referencing rendered page images for layout validation.
  • Figure 5: Human alignment with LLM-As-Judge paradigm in high FFA case (FFA=100%)
  • ...and 1 more figures