Table of Contents
Fetching ...

LexDrafter: Terminology Drafting for Legislative Documents using Retrieval Augmented Generation

Ashish Chouhan, Michael Gertz

TL;DR

LexDrafter presents a retrieval-augmented generation framework to assist drafting Definitions articles in EU legislation by leveraging existing EUR-Lex term definitions. It builds a document corpus and a definition-element corpus from energy-domain acts, resolves cross-references, and uses a TermRetriever plus a RAG-based generator to propose or generate definitions on demand. The approach aims to harmonize definitions across acts, reduce manual effort, and mitigate ambiguities. Experimental evaluation on energy-domain documents shows generated definitions achieve high semantic fidelity (as indicated by BERTScore) but lower exact word overlap (BLEU), with Vicuna outperforming LLaMA-2; future work includes automatic term identification across domains and domain-tuned evaluation metrics.

Abstract

With the increase in legislative documents at the EU, the number of new terms and their definitions is increasing as well. As per the Joint Practical Guide of the European Parliament, the Council and the Commission, terms used in legal documents shall be consistent, and identical concepts shall be expressed without departing from their meaning in ordinary, legal, or technical language. Thus, while drafting a new legislative document, having a framework that provides insights about existing definitions and helps define new terms based on a document's context will support such harmonized legal definitions across different regulations and thus avoid ambiguities. In this paper, we present LexDrafter, a framework that assists in drafting Definitions articles for legislative documents using retrieval augmented generation (RAG) and existing term definitions present in different legislative documents. For this, definition elements are built by extracting definitions from existing documents. Using definition elements and RAG, a Definitions article can be suggested on demand for a legislative document that is being drafted. We demonstrate and evaluate the functionality of LexDrafter using a collection of EU documents from the energy domain. The code for LexDrafter framework is available at https://github.com/achouhan93/LexDrafter.

LexDrafter: Terminology Drafting for Legislative Documents using Retrieval Augmented Generation

TL;DR

LexDrafter presents a retrieval-augmented generation framework to assist drafting Definitions articles in EU legislation by leveraging existing EUR-Lex term definitions. It builds a document corpus and a definition-element corpus from energy-domain acts, resolves cross-references, and uses a TermRetriever plus a RAG-based generator to propose or generate definitions on demand. The approach aims to harmonize definitions across acts, reduce manual effort, and mitigate ambiguities. Experimental evaluation on energy-domain documents shows generated definitions achieve high semantic fidelity (as indicated by BERTScore) but lower exact word overlap (BLEU), with Vicuna outperforming LLaMA-2; future work includes automatic term identification across domains and domain-tuned evaluation metrics.

Abstract

With the increase in legislative documents at the EU, the number of new terms and their definitions is increasing as well. As per the Joint Practical Guide of the European Parliament, the Council and the Commission, terms used in legal documents shall be consistent, and identical concepts shall be expressed without departing from their meaning in ordinary, legal, or technical language. Thus, while drafting a new legislative document, having a framework that provides insights about existing definitions and helps define new terms based on a document's context will support such harmonized legal definitions across different regulations and thus avoid ambiguities. In this paper, we present LexDrafter, a framework that assists in drafting Definitions articles for legislative documents using retrieval augmented generation (RAG) and existing term definitions present in different legislative documents. For this, definition elements are built by extracting definitions from existing documents. Using definition elements and RAG, a Definitions article can be suggested on demand for a legislative document that is being drafted. We demonstrate and evaluate the functionality of LexDrafter using a collection of EU documents from the energy domain. The code for LexDrafter framework is available at https://github.com/achouhan93/LexDrafter.
Paper Structure (14 sections, 4 figures, 1 table)

This paper contains 14 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Number of legal documents (with definitions) per year.
  • Figure 2: Overview of workflow to build the document and definition corpus, with an example document on "setting a framework for energy labelling…" (Celex ID 32017R1369). Legal acts based on Celex ID are extracted from the EUR-Lex platform and filtered to consider only legal acts in HTML format. The DocStruct component extracts all information from legal acts and stores it in the document corpus; the DefExtract component identifies and extracts definitions from a document. The CiteResolver component resolves citations in explanations that have references.
  • Figure 3: Overview of the Definition Generation workflow with an example document on "… production of renewable liquid and gaseous transport fuels…" (Celex ID 32023R1184). Terms to be defined are passed to the TermRetriever component to retrieve matching definition elements. 'bidding zone' is already defined in another legal act; therefore, the definition simply cites this legal act. For 'fuel producer', a definition does not exist and needs to be generated by the RAG component. The Retriever subcomponent retrieves fragments and passes these fragments along with the term to be defined to the generator to generate a definition for that term.
  • Figure 4: Histogram of the distribution of definition word lengths. Vertical lines show mean length (continuous red) and standard deviation (dotted green lines).