Algorithm for Automatic Legislative Text Consolidation
Matias Etcheverry, Thibaud Real, Pauline Chavallard
TL;DR
This work tackles the time-consuming task of legislative text consolidation by introducing a generative approach fine-tuned with LoRA on a light quantized model. It builds a public dataset of legislative triplets and develops a pipeline combining section splitting, entity recognition, and a generative consolidation model, achieving meaningful performance improvements over a span-extraction baseline. On a real-world bill, the method demonstrated practical viability, with Open-LLaMA-13b delivering competitive correctness and GPT-4 achieving higher consolidation rates at the cost of broader prompts. The study highlights the importance of data quality and model scale for reliable automated consolidation, and points to open avenues such as larger context windows and Mixture-of-LoRA-Experts for further gains.
Abstract
This study introduces a method for automating the consolidation process in a legal context, a time-consuming task traditionally performed by legal professionals. We present a generative approach that processes legislative texts to automatically apply amendments. Our method employs light quantized generative model, fine-tuned with LoRA, to generate accurate and reliable amended texts. To the authors knowledge, this is the first time generative models are used on legislative text consolidation. Our dataset is publicly available on HuggingFace1. Experimental results demonstrate a significant improvement in efficiency, offering faster updates to legal documents. A full automated pipeline of legislative text consolidation can be done in a few hours, with a success rate of more than 63% on a difficult bill.
