Table of Contents
Fetching ...

Contrato360 2.0: A Document and Database-Driven Question-Answer System using Large Language Models and Agents

Antony Seabra, Claudio Cavalcante, Joao Nepomuceno, Lucas Lago, Nicolaas Ruberg, Sergio Lifschitz

TL;DR

Contrato360 tackles the challenge of extracting precise information from lengthy contract documents and CMS data by tying PDFs and structured data into an LLM-driven Q&A pipeline. The approach fuses Retrieval-Augmented Generation, Text-to-SQL, prompt engineering, and multi-agent orchestration to avoid re-training while dynamically routing queries to the most suitable processing module. Key contributions include metadata-enhanced, section-based PDF retrieval, safe SQL querying via a LangChain agent, and role-aware prompts coupled with agent-driven workflow. Evaluation on 75 contracts demonstrates strong performance on direct questions and promising results for indirect ones, validating the architecture for enterprise information systems. This work offers a practical, scalable framework for integrating unstructured documents and structured databases in contract management with potential applicability to other domains.

Abstract

We present a question-and-answer (Q\&A) application designed to support the contract management process by leveraging combined information from contract documents (PDFs) and data retrieved from contract management systems (database). This data is processed by a large language model (LLM) to provide precise and relevant answers. The accuracy of these responses is further enhanced through the use of Retrieval-Augmented Generation (RAG), text-to-SQL techniques, and agents that dynamically orchestrate the workflow. These techniques eliminate the need to retrain the language model. Additionally, we employed Prompt Engineering to fine-tune the focus of responses. Our findings demonstrate that this multi-agent orchestration and combination of techniques significantly improve the relevance and accuracy of the answers, offering a promising direction for future information systems.

Contrato360 2.0: A Document and Database-Driven Question-Answer System using Large Language Models and Agents

TL;DR

Contrato360 tackles the challenge of extracting precise information from lengthy contract documents and CMS data by tying PDFs and structured data into an LLM-driven Q&A pipeline. The approach fuses Retrieval-Augmented Generation, Text-to-SQL, prompt engineering, and multi-agent orchestration to avoid re-training while dynamically routing queries to the most suitable processing module. Key contributions include metadata-enhanced, section-based PDF retrieval, safe SQL querying via a LangChain agent, and role-aware prompts coupled with agent-driven workflow. Evaluation on 75 contracts demonstrates strong performance on direct questions and promising results for indirect ones, validating the architecture for enterprise information systems. This work offers a practical, scalable framework for integrating unstructured documents and structured databases in contract management with potential applicability to other domains.

Abstract

We present a question-and-answer (Q\&A) application designed to support the contract management process by leveraging combined information from contract documents (PDFs) and data retrieved from contract management systems (database). This data is processed by a large language model (LLM) to provide precise and relevant answers. The accuracy of these responses is further enhanced through the use of Retrieval-Augmented Generation (RAG), text-to-SQL techniques, and agents that dynamically orchestrate the workflow. These techniques eliminate the need to retrain the language model. Additionally, we employed Prompt Engineering to fine-tune the focus of responses. Our findings demonstrate that this multi-agent orchestration and combination of techniques significantly improve the relevance and accuracy of the answers, offering a promising direction for future information systems.

Paper Structure

This paper contains 15 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Retrieval-Augmented Generation
  • Figure 2: Methodology Workflow Combining Different Techniques
  • Figure 3: Chunking applied to Contracts
  • Figure 4: Contracts metadata
  • Figure 5: Application architecture
  • ...and 3 more figures