Contrato360 2.0: A Document and Database-Driven Question-Answer System using Large Language Models and Agents
Antony Seabra, Claudio Cavalcante, Joao Nepomuceno, Lucas Lago, Nicolaas Ruberg, Sergio Lifschitz
TL;DR
Contrato360 tackles the challenge of extracting precise information from lengthy contract documents and CMS data by tying PDFs and structured data into an LLM-driven Q&A pipeline. The approach fuses Retrieval-Augmented Generation, Text-to-SQL, prompt engineering, and multi-agent orchestration to avoid re-training while dynamically routing queries to the most suitable processing module. Key contributions include metadata-enhanced, section-based PDF retrieval, safe SQL querying via a LangChain agent, and role-aware prompts coupled with agent-driven workflow. Evaluation on 75 contracts demonstrates strong performance on direct questions and promising results for indirect ones, validating the architecture for enterprise information systems. This work offers a practical, scalable framework for integrating unstructured documents and structured databases in contract management with potential applicability to other domains.
Abstract
We present a question-and-answer (Q\&A) application designed to support the contract management process by leveraging combined information from contract documents (PDFs) and data retrieved from contract management systems (database). This data is processed by a large language model (LLM) to provide precise and relevant answers. The accuracy of these responses is further enhanced through the use of Retrieval-Augmented Generation (RAG), text-to-SQL techniques, and agents that dynamically orchestrate the workflow. These techniques eliminate the need to retrain the language model. Additionally, we employed Prompt Engineering to fine-tune the focus of responses. Our findings demonstrate that this multi-agent orchestration and combination of techniques significantly improve the relevance and accuracy of the answers, offering a promising direction for future information systems.
