Table of Contents
Fetching ...

Retrieval Augmented Generation (RAG) for Fintech: Agentic Design and Evaluation

Thomas Cook, Richard Osuagwu, Liman Tsatiashvili, Vrynsia Vrynsia, Koustav Ghosal, Maraim Masoud, Riccardo Mattivi

TL;DR

The paper tackles the challenge of applying retrieval-augmented generation in fintech, where domain-specific ontologies and acronym-heavy content hinder standard RAG pipelines. It introduces an agentic RAG (A-RAG) with an Orchestrator that coordinates specialized agents for acronym resolution, sub-query generation, parallel retrieval, cross-encoder re-ranking, and QA-driven refinement, to enable iterative and domain-aware retrieval. Evaluated against a baseline RAG (B-RAG) on an enterprise fintech knowledge base, A-RAG achieves higher retrieval accuracy (62.35% vs 54.12%), and a broader notion of correctness when semantically equivalent sources are considered (69.41% vs 58.82%), albeit with higher latency (5.02s vs 0.79s). The study demonstrates that structured, multi-agent pipelines enhance retrieval robustness in complex, domain-specific environments, while highlighting trade-offs and avenues for future improvements such as adaptive agent coordination and stronger context-awareness.

Abstract

Retrieval-Augmented Generation (RAG) systems often face limitations in specialized domains such as fintech, where domain-specific ontologies, dense terminology, and acronyms complicate effective retrieval and synthesis. This paper introduces an agentic RAG architecture designed to address these challenges through a modular pipeline of specialized agents. The proposed system supports intelligent query reformulation, iterative sub-query decomposition guided by keyphrase extraction, contextual acronym resolution, and cross-encoder-based context re-ranking. We evaluate our approach against a standard RAG baseline using a curated dataset of 85 question--answer--reference triples derived from an enterprise fintech knowledge base. Experimental results demonstrate that the agentic RAG system outperforms the baseline in retrieval precision and relevance, albeit with increased latency. These findings suggest that structured, multi-agent methodologies offer a promising direction for enhancing retrieval robustness in complex, domain-specific settings.

Retrieval Augmented Generation (RAG) for Fintech: Agentic Design and Evaluation

TL;DR

The paper tackles the challenge of applying retrieval-augmented generation in fintech, where domain-specific ontologies and acronym-heavy content hinder standard RAG pipelines. It introduces an agentic RAG (A-RAG) with an Orchestrator that coordinates specialized agents for acronym resolution, sub-query generation, parallel retrieval, cross-encoder re-ranking, and QA-driven refinement, to enable iterative and domain-aware retrieval. Evaluated against a baseline RAG (B-RAG) on an enterprise fintech knowledge base, A-RAG achieves higher retrieval accuracy (62.35% vs 54.12%), and a broader notion of correctness when semantically equivalent sources are considered (69.41% vs 58.82%), albeit with higher latency (5.02s vs 0.79s). The study demonstrates that structured, multi-agent pipelines enhance retrieval robustness in complex, domain-specific environments, while highlighting trade-offs and avenues for future improvements such as adaptive agent coordination and stronger context-awareness.

Abstract

Retrieval-Augmented Generation (RAG) systems often face limitations in specialized domains such as fintech, where domain-specific ontologies, dense terminology, and acronyms complicate effective retrieval and synthesis. This paper introduces an agentic RAG architecture designed to address these challenges through a modular pipeline of specialized agents. The proposed system supports intelligent query reformulation, iterative sub-query decomposition guided by keyphrase extraction, contextual acronym resolution, and cross-encoder-based context re-ranking. We evaluate our approach against a standard RAG baseline using a curated dataset of 85 question--answer--reference triples derived from an enterprise fintech knowledge base. Experimental results demonstrate that the agentic RAG system outperforms the baseline in retrieval precision and relevance, albeit with increased latency. These findings suggest that structured, multi-agent methodologies offer a promising direction for enhancing retrieval robustness in complex, domain-specific settings.

Paper Structure

This paper contains 21 sections, 2 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Word cloud illustrating the distribution of internal knowledge artifacts in fintech. The prominence of terms such as “feature,” “status,” “model,” and “API” reflects the operational focus of internal documentation—often centered around technical specifications, product state, and integration interfaces. This concentration of tightly scoped, semi-structured information highlights the challenge of designing RAG systems that can interpret fragmented context across teams and tools, where standard SaaS-based approaches fall short due to regulatory and organizational constraints.
  • Figure 2: Overview of the B-RAG pipeline. The process follows a linear flow beginning with the user's initial query, followed by query reformulation, single-pass retrieval, summarization, and response generation.
  • Figure 3: Comparison of hybrid pipeline architectures for B-RAG and A-RAG workflows. The left panel shows B-RAG’s single-pass process, including query reformulation, retrieval, answer synthesis, and output generation without iterative refinement. The right panel illustrates A-RAG’s extended pipeline, featuring acronym resolution, sub-query generation, document re-ranking, and an answer quality assessment (QA) agent. If the QA agent assigns low confidence to the synthesized answer (e.g., below a set threshold), a feedback loop triggers sub-query generation to iteratively expand and refine retrieval.
  • Figure 4: Distribution of chunk lengths by word count. The majority of chunks are between 50 and 120 words, showing the trade-off between retrieval granularity and contextual coherence.
  • Figure 5: Top 20 most frequent terms in the corpus, ranked by raw frequency.
  • ...and 2 more figures