Table of Contents
Fetching ...

Improving Retrieval for RAG based Question Answering Models on Financial Documents

Spurthi Setty, Harsh Thakkar, Alyssa Lee, Eden Chung, Natan Vidra

TL;DR

This paper addresses the bottleneck in Retrieval Augmented Generation (RAG) for finance-focused QA by analyzing limitations in current retrieval pipelines and proposing concrete retrieval-centric improvements. It introduces techniques across chunking (including element-based and recursive strategies), query expansion via HyDE, metadata-based indexing, re-ranking with cross-encoders, and domain-aware embedding considerations. Evaluations on FinanceBench show that providing the correct context dramatically increases answer quality, while zero-shot retrieval enhancements offer meaningful but more modest gains, underscoring the central role of effective retrieval. The work demonstrates that robust retrieval not only improves accuracy but also enhances the reliability and applicability of LLMs in finance and other data-rich domains, with clear directions for future enhancements such as knowledge graphs and annotated embeddings.

Abstract

The effectiveness of Large Language Models (LLMs) in generating accurate responses relies heavily on the quality of input provided, particularly when employing Retrieval Augmented Generation (RAG) techniques. RAG enhances LLMs by sourcing the most relevant text chunk(s) to base queries upon. Despite the significant advancements in LLMs' response quality in recent years, users may still encounter inaccuracies or irrelevant answers; these issues often stem from suboptimal text chunk retrieval by RAG rather than the inherent capabilities of LLMs. To augment the efficacy of LLMs, it is crucial to refine the RAG process. This paper explores the existing constraints of RAG pipelines and introduces methodologies for enhancing text retrieval. It delves into strategies such as sophisticated chunking techniques, query expansion, the incorporation of metadata annotations, the application of re-ranking algorithms, and the fine-tuning of embedding algorithms. Implementing these approaches can substantially improve the retrieval quality, thereby elevating the overall performance and reliability of LLMs in processing and responding to queries.

Improving Retrieval for RAG based Question Answering Models on Financial Documents

TL;DR

This paper addresses the bottleneck in Retrieval Augmented Generation (RAG) for finance-focused QA by analyzing limitations in current retrieval pipelines and proposing concrete retrieval-centric improvements. It introduces techniques across chunking (including element-based and recursive strategies), query expansion via HyDE, metadata-based indexing, re-ranking with cross-encoders, and domain-aware embedding considerations. Evaluations on FinanceBench show that providing the correct context dramatically increases answer quality, while zero-shot retrieval enhancements offer meaningful but more modest gains, underscoring the central role of effective retrieval. The work demonstrates that robust retrieval not only improves accuracy but also enhances the reliability and applicability of LLMs in finance and other data-rich domains, with clear directions for future enhancements such as knowledge graphs and annotated embeddings.

Abstract

The effectiveness of Large Language Models (LLMs) in generating accurate responses relies heavily on the quality of input provided, particularly when employing Retrieval Augmented Generation (RAG) techniques. RAG enhances LLMs by sourcing the most relevant text chunk(s) to base queries upon. Despite the significant advancements in LLMs' response quality in recent years, users may still encounter inaccuracies or irrelevant answers; these issues often stem from suboptimal text chunk retrieval by RAG rather than the inherent capabilities of LLMs. To augment the efficacy of LLMs, it is crucial to refine the RAG process. This paper explores the existing constraints of RAG pipelines and introduces methodologies for enhancing text retrieval. It delves into strategies such as sophisticated chunking techniques, query expansion, the incorporation of metadata annotations, the application of re-ranking algorithms, and the fine-tuning of embedding algorithms. Implementing these approaches can substantially improve the retrieval quality, thereby elevating the overall performance and reliability of LLMs in processing and responding to queries.
Paper Structure (17 sections, 3 figures)

This paper contains 17 sections, 3 figures.

Figures (3)

  • Figure 1: Retreival Augmented Generation Architecture
  • Figure 2: illustration of HyDE
  • Figure 3: Aggregate Results Plot