FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents

Yilun Zhao; Yitao Long; Yuru Jiang; Chengye Wang; Weiyuan Chen; Hongjun Liu; Yiming Zhang; Xiangru Tang; Chen Zhao; Arman Cohan

FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents

Yilun Zhao, Yitao Long, Yuru Jiang, Chengye Wang, Weiyuan Chen, Hongjun Liu, Yiming Zhang, Xiangru Tang, Chen Zhao, Arman Cohan

TL;DR

FinDVer can serve as a valuable benchmark for evaluating LLM capabilities in claim verification over complex, expert-domain documents and show that even the current best-performing system (i.e., GPT-4o) significantly lags behind human experts.

Abstract

We introduce FinDVer, a comprehensive benchmark specifically designed to evaluate the explainable claim verification capabilities of LLMs in the context of understanding and analyzing long, hybrid-content financial documents. FinDVer contains 2,400 expert-annotated examples, divided into three subsets: information extraction, numerical reasoning, and knowledge-intensive reasoning, each addressing common scenarios encountered in real-world financial contexts. We assess a broad spectrum of LLMs under long-context and RAG settings. Our results show that even the current best-performing system, GPT-4o, still lags behind human experts. We further provide in-depth analysis on long-context and RAG setting, Chain-of-Thought reasoning, and model reasoning errors, offering insights to drive future advancements. We believe that FinDVer can serve as a valuable benchmark for evaluating LLMs in claim verification over complex, expert-domain documents.

FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents

TL;DR

Abstract

FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)