A Large Language Model-based Framework for Semi-Structured Tender Document Retrieval-Augmented Generation
Yilong Zhao, Daifeng Li
TL;DR
This paper tackles the challenge of generating accurate, domain-specific tender documents by integrating retrieval-augmented generation with LLMs. It introduces a three-component framework: template retrieval, template filling with smart tags, and modification via a procurement knowledge base built with GraphRAG, graph queries in Neo4j, and a DeepSeek-enabled LLM. The approach leverages historical tender templates, structured Word templates, and a knowledge graph to ensure compliance and coherence, achieving strong gains over baselines on a medical procurement dataset of 1406 documents. The results highlight the practical potential for scalable, professional procurement document generation in government contexts, with future work targeting broader corpora and more complex table handling.
Abstract
The drafting of documents in the procurement field has progressively become more complex and diverse, driven by the need to meet legal requirements, adapt to technological advancements, and address stakeholder demands. While large language models (LLMs) show potential in document generation, most LLMs lack specialized knowledge in procurement. To address this gap, we use retrieval-augmented techniques to achieve professional document generation, ensuring accuracy and relevance in procurement documentation.
