SMARTFinRAG: Interactive Modularized Financial RAG Benchmark
Yiwei Zha
TL;DR
SMARTFinRAG tackles the challenge of evaluating financial Retrieval-Augmented Generation (RAG) systems by delivering a modular, end-to-end benchmarking platform that supports real-time document ingestion, dynamic component swapping, and an interactive demonstration UI. It combines a document-based QA generation workflow with a dual-maceted evaluation engine that reports retrieval quality via HR, MRR, P, R, AP, and NDCG, and generation quality via LLM-as-Judge faithfulness and relevancy. Through experiments across retriever families, LLM backends, and decoding settings, the study shows that hybrid retrievers generally improve grounding, decoding parameters exert model-specific effects, and GPT-4o offers strong but not universal performance. The platform aims to accelerate trustworthy financial NLP research and bridge the gap to production RAG systems by providing reproducible, interactive evaluation and extensible components.
Abstract
Financial sectors are rapidly adopting language model technologies, yet evaluating specialized RAG systems in this domain remains challenging. This paper introduces SMARTFinRAG, addressing three critical gaps in financial RAG assessment: (1) a fully modular architecture where components can be dynamically interchanged during runtime; (2) a document-centric evaluation paradigm generating domain-specific QA pairs from newly ingested financial documents; and (3) an intuitive interface bridging research-implementation divides. Our evaluation quantifies both retrieval efficacy and response quality, revealing significant performance variations across configurations. The platform's open-source architecture supports transparent, reproducible research while addressing practical deployment challenges faced by financial institutions implementing RAG systems.
