Table of Contents
Fetching ...

Algorithm for Automatic Legislative Text Consolidation

Matias Etcheverry, Thibaud Real, Pauline Chavallard

TL;DR

This work tackles the time-consuming task of legislative text consolidation by introducing a generative approach fine-tuned with LoRA on a light quantized model. It builds a public dataset of legislative triplets and develops a pipeline combining section splitting, entity recognition, and a generative consolidation model, achieving meaningful performance improvements over a span-extraction baseline. On a real-world bill, the method demonstrated practical viability, with Open-LLaMA-13b delivering competitive correctness and GPT-4 achieving higher consolidation rates at the cost of broader prompts. The study highlights the importance of data quality and model scale for reliable automated consolidation, and points to open avenues such as larger context windows and Mixture-of-LoRA-Experts for further gains.

Abstract

This study introduces a method for automating the consolidation process in a legal context, a time-consuming task traditionally performed by legal professionals. We present a generative approach that processes legislative texts to automatically apply amendments. Our method employs light quantized generative model, fine-tuned with LoRA, to generate accurate and reliable amended texts. To the authors knowledge, this is the first time generative models are used on legislative text consolidation. Our dataset is publicly available on HuggingFace1. Experimental results demonstrate a significant improvement in efficiency, offering faster updates to legal documents. A full automated pipeline of legislative text consolidation can be done in a few hours, with a success rate of more than 63% on a difficult bill.

Algorithm for Automatic Legislative Text Consolidation

TL;DR

This work tackles the time-consuming task of legislative text consolidation by introducing a generative approach fine-tuned with LoRA on a light quantized model. It builds a public dataset of legislative triplets and develops a pipeline combining section splitting, entity recognition, and a generative consolidation model, achieving meaningful performance improvements over a span-extraction baseline. On a real-world bill, the method demonstrated practical viability, with Open-LLaMA-13b delivering competitive correctness and GPT-4 achieving higher consolidation rates at the cost of broader prompts. The study highlights the importance of data quality and model scale for reliable automated consolidation, and points to open avenues such as larger context windows and Mixture-of-LoRA-Experts for further gains.

Abstract

This study introduces a method for automating the consolidation process in a legal context, a time-consuming task traditionally performed by legal professionals. We present a generative approach that processes legislative texts to automatically apply amendments. Our method employs light quantized generative model, fine-tuned with LoRA, to generate accurate and reliable amended texts. To the authors knowledge, this is the first time generative models are used on legislative text consolidation. Our dataset is publicly available on HuggingFace1. Experimental results demonstrate a significant improvement in efficiency, offering faster updates to legal documents. A full automated pipeline of legislative text consolidation can be done in a few hours, with a success rate of more than 63% on a difficult bill.

Paper Structure

This paper contains 25 sections, 5 figures, 6 tables.

Figures (5)

  • Figure 1: General structure of the PLF
  • Figure 2: Number of modification sections published per year in France
  • Figure 3: Word error distributions per model per modification type
  • Figure 4: Full consolidation pipeline
  • Figure 5: The correctness rates against prompt length are plotted for Open-LLaMa-13b and GPT-4 on the same consolidation samples (49.8% of the PLF). Each dot represents a sample of the PLF consolidation, indicating whether it is correct or not. The curve at prompt length $i$ illustrates the rate of correct consolidation among samples with a prompt length less than $i$.