Enhancing Q&A with Domain-Specific Fine-Tuning and Iterative Reasoning: A Comparative Study

Zooey Nguyen; Anthony Annunziata; Vinh Luong; Sang Dinh; Quynh Le; Anh Hai Ha; Chanh Le; Hong An Phan; Shruti Raghavan; Christopher Nguyen

Enhancing Q&A with Domain-Specific Fine-Tuning and Iterative Reasoning: A Comparative Study

Zooey Nguyen, Anthony Annunziata, Vinh Luong, Sang Dinh, Quynh Le, Anh Hai Ha, Chanh Le, Hong An Phan, Shruti Raghavan, Christopher Nguyen

TL;DR

This study addresses the challenge of domain-specific Q&A with large language models by examining two levers: domain-specific fine-tuning and iterative reasoning. Using the FinanceBench SEC financial filings dataset, the authors quantify that fine-tuning embedding models for indexing and retrieval yields meaningful accuracy gains, often outperforming fine-tuning the generator. Introducing the OODA reasoning loop on top of retrieval-augmented generation delivers the largest performance improvements, bringing QA outputs closer to human-expert quality. The work culminates in a structured technical design space to guide practical AI-system decisions and lays out actionable recommendations for deploying high-precision, domain-aware Q&A systems in finance and beyond.

Abstract

This paper investigates the impact of domain-specific model fine-tuning and of reasoning mechanisms on the performance of question-answering (Q&A) systems powered by large language models (LLMs) and Retrieval-Augmented Generation (RAG). Using the FinanceBench SEC financial filings dataset, we observe that, for RAG, combining a fine-tuned embedding model with a fine-tuned LLM achieves better accuracy than generic models, with relatively greater gains attributable to fine-tuned embedding models. Additionally, employing reasoning iterations on top of RAG delivers an even bigger jump in performance, enabling the Q&A systems to get closer to human-expert quality. We discuss the implications of such findings, propose a structured technical design space capturing major technical components of Q&A AI, and provide recommendations for making high-impact technical choices for such components. We plan to follow up on this work with actionable guides for AI teams and further investigations into the impact of domain-specific augmentation in RAG and into agentic AI capabilities such as advanced planning and reasoning.

Enhancing Q&A with Domain-Specific Fine-Tuning and Iterative Reasoning: A Comparative Study

TL;DR

Abstract

Paper Structure (20 sections, 5 figures, 5 tables)

This paper contains 20 sections, 5 figures, 5 tables.

Introduction
Related Work
Methodology
Proposed Framework and Technical Design Space
Embedding Models for Indexing & Retrieval
Generative Models for Answer Generation
Iterative Reasoning
The Q&A AI Technical Design Space
Financial Analysis Benchmark Dataset
Evaluation Metrics
Retrieval Quality Metrics
Answer Correctness Metrics
Experiments & Results
Retrieval Quality Results
Answer Correctness Results
...and 5 more sections

Figures (5)

Figure 1: A typical OODA reasoning loop.
Figure 2: A specific implementation of OODA applied to question-answering with RAG.
Figure 3: Comparison of pure-RAG and OODA-enabled answers to a FinanceBench question.
Figure 4: A structured technical design space capturing high-impact components within question-answering systems.
Figure 5: Question difficulty categorizations for FinanceBench.

Enhancing Q&A with Domain-Specific Fine-Tuning and Iterative Reasoning: A Comparative Study

TL;DR

Abstract

Enhancing Q&A with Domain-Specific Fine-Tuning and Iterative Reasoning: A Comparative Study

Authors

TL;DR

Abstract

Table of Contents

Figures (5)