Streamlining Industrial Contract Management with Retrieval-Augmented LLMs
Kristi Topollai, Tolga Dimlioglu, Anna Choromanska, Simon Odie, Reginald Hui
TL;DR
This paper tackles automating contract revision analysis under scarce labeled data and abundant legacy contracts by proposing a retrieval-augmented generation pipeline that combines synthetic data generation, semantic clause retrieval, and an acceptability classifier. A reinforcement-learning-from-feedback (RLHF)-style mechanism aligns the generator to produce revisions more likely to be accepted, while maintaining a human-in-the-loop to preserve legal rigor. The approach is validated on an internal utility-sector dataset, achieving strong performance in identifying and optimizing problematic revisions and demonstrating practical viability in low-resource settings. The work offers a modular, extensible framework for transforming unstructured legacy contracts into actionable insights, accelerating contract revision workflows without sacrificing contractual integrity.
Abstract
Contract management involves reviewing and negotiating provisions, individual clauses that define rights, obligations, and terms of agreement. During this process, revisions to provisions are proposed and iteratively refined, some of which may be problematic or unacceptable. Automating this workflow is challenging due to the scarcity of labeled data and the abundance of unstructured legacy contracts. In this paper, we present a modular framework designed to streamline contract management through a retrieval-augmented generation (RAG) pipeline. Our system integrates synthetic data generation, semantic clause retrieval, acceptability classification, and reward-based alignment to flag problematic revisions and generate improved alternatives. Developed and evaluated in collaboration with an industry partner, our system achieves over 80% accuracy in both identifying and optimizing problematic revisions, demonstrating strong performance under real-world, low-resource conditions and offering a practical means of accelerating contract revision workflows.
