Table of Contents
Fetching ...

Enhancing Financial Report Question-Answering: A Retrieval-Augmented Generation System with Reranking Analysis

Zhiyuan Cheng, Longying Lai, Yue Liu, Kai Cheng, Xiaoxi Qi

Abstract

Financial analysts face significant challenges extracting information from lengthy 10-K reports, which often exceed 100 pages. This paper presents a Retrieval-Augmented Generation (RAG) system designed to answer questions about S&P 500 financial reports and evaluates the impact of neural reranking on system performance. Our pipeline employs hybrid search combining full-text and semantic retrieval, followed by an optional reranking stage using a cross-encoder model. We conduct systematic evaluation using the FinDER benchmark dataset, comprising 1,500 queries across five experimental groups. Results demonstrate that reranking significantly improves answer quality, achieving 49.0 percent correctness for scores of 8 or above compared to 33.5 percent without reranking, representing a 15.5 percentage point improvement. Additionally, the error rate for completely incorrect answers decreases from 35.3 percent to 22.5 percent. Our findings emphasize the critical role of reranking in financial RAG systems and demonstrate performance improvements over baseline methods through modern language models and refined retrieval strategies.

Enhancing Financial Report Question-Answering: A Retrieval-Augmented Generation System with Reranking Analysis

Abstract

Financial analysts face significant challenges extracting information from lengthy 10-K reports, which often exceed 100 pages. This paper presents a Retrieval-Augmented Generation (RAG) system designed to answer questions about S&P 500 financial reports and evaluates the impact of neural reranking on system performance. Our pipeline employs hybrid search combining full-text and semantic retrieval, followed by an optional reranking stage using a cross-encoder model. We conduct systematic evaluation using the FinDER benchmark dataset, comprising 1,500 queries across five experimental groups. Results demonstrate that reranking significantly improves answer quality, achieving 49.0 percent correctness for scores of 8 or above compared to 33.5 percent without reranking, representing a 15.5 percentage point improvement. Additionally, the error rate for completely incorrect answers decreases from 35.3 percent to 22.5 percent. Our findings emphasize the critical role of reranking in financial RAG systems and demonstrate performance improvements over baseline methods through modern language models and refined retrieval strategies.
Paper Structure (37 sections, 1 equation, 2 figures, 2 tables)

This paper contains 37 sections, 1 equation, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Document Processing Pipeline. The system converts HTML reports to PDF, extracts and chunks text, generates embeddings, and stores data in SQLite (for keyword search) and FAISS (for semantic search).
  • Figure 2: Query Processing Pipeline with Reranking Ablation. The pipeline processes queries through rewriting, hybrid search (FTS + semantic), RRF fusion, optional reranking, and answer generation. The ablation study compares performance with and without the reranking stage.