Table of Contents
Fetching ...

An Senegalese Legal Texts Structuration Using LLM-augmented Knowledge Graph

Oumar Kane, Mouhamad M. Allaya, Dame Samb, Mamadou Bousso

TL;DR

The paper addresses the challenge of accessing and understanding Senegalese legal texts by building an LLM-augmented knowledge graph from land and public-domain codes. It combines rule-based extraction with a Neo4j graph database and compares multiple LLMs for knowledge-triple extraction, identifying GPT-4o, GPT-4, and Mistral-Large as top performers under ROUGE metrics. The study yields 7,967 extracted articles, a graph of 2,872 nodes and 10,774 relationships, and a framework leveraging RAG and ReAct for an intelligent legal assistant. This work advances transparent, scalable access to legal information and provides a practical pathway for citizen and professional engagement with Senegalese jurisprudence.

Abstract

This study examines the application of artificial intelligence (AI) and large language models (LLM) to improve access to legal texts in Senegal's judicial system. The emphasis is on the difficulties of extracting and organizing legal documents, highlighting the need for better access to judicial information. The research successfully extracted 7,967 articles from various legal documents, particularly focusing on the Land and Public Domain Code. A detailed graph database was developed, which contains 2,872 nodes and 10,774 relationships, aiding in the visualization of interconnections within legal texts. In addition, advanced triple extraction techniques were utilized for knowledge, demonstrating the effectiveness of models such as GPT-4o, GPT-4, and Mistral-Large in identifying relationships and relevant metadata. Through these technologies, the aim is to create a solid framework that allows Senegalese citizens and legal professionals to more effectively understand their rights and responsibilities.

An Senegalese Legal Texts Structuration Using LLM-augmented Knowledge Graph

TL;DR

The paper addresses the challenge of accessing and understanding Senegalese legal texts by building an LLM-augmented knowledge graph from land and public-domain codes. It combines rule-based extraction with a Neo4j graph database and compares multiple LLMs for knowledge-triple extraction, identifying GPT-4o, GPT-4, and Mistral-Large as top performers under ROUGE metrics. The study yields 7,967 extracted articles, a graph of 2,872 nodes and 10,774 relationships, and a framework leveraging RAG and ReAct for an intelligent legal assistant. This work advances transparent, scalable access to legal information and provides a practical pathway for citizen and professional engagement with Senegalese jurisprudence.

Abstract

This study examines the application of artificial intelligence (AI) and large language models (LLM) to improve access to legal texts in Senegal's judicial system. The emphasis is on the difficulties of extracting and organizing legal documents, highlighting the need for better access to judicial information. The research successfully extracted 7,967 articles from various legal documents, particularly focusing on the Land and Public Domain Code. A detailed graph database was developed, which contains 2,872 nodes and 10,774 relationships, aiding in the visualization of interconnections within legal texts. In addition, advanced triple extraction techniques were utilized for knowledge, demonstrating the effectiveness of models such as GPT-4o, GPT-4, and Mistral-Large in identifying relationships and relevant metadata. Through these technologies, the aim is to create a solid framework that allows Senegalese citizens and legal professionals to more effectively understand their rights and responsibilities.

Paper Structure

This paper contains 18 sections, 8 figures, 2 tables, 1 algorithm.

Figures (8)

  • Figure 1: Hierarchies of Subdivisions. The green arrows illustrate the primary pathways of the architectural design. Architecture B pertains to the land and public domain code, along with the public markets code, whereas the majority of legal codes conform to architectures A and C.
  • Figure 2: Metadata Identification. This figure demonstrates metadata derived from the land and public domain collection. Direct properties are extracted from the text. Other metadata, including subsections and article titles, are omitted in this instance but are integral to the extraction process.
  • Figure 3: Instance of a prompt with two examples for extracting knowledge triples from legal articles. Each example includes the article content (in red), article metadata (in orange), and references extracted using LLM parsers (in green). The output consists of various triples (in blue). We provide only the Output section for the current article, allowing the LLM to predict the corresponding triples based on the examples. Each triple typically has 'the current article' as the subject, with predicates like 'refers to' or 'corresponds to,' and the object as the legal entity. The structure specifies that "article" or "articles" is followed by their respective number(s), separated by commas. It includes 'of law' plus the law number or 'of decree' plus the decree number. Consecutive article numbers are abbreviated with an ellipsis.
  • Figure 4: Neo4j Database Summary: The node and relationship types are written in French, as the legal documents were authored in that language. The database contains 2,872 nodes and 10,774 relationship types.
  • Figure 5: Illustration of a Complex Graph: This diagram depicts the interrelations among decrees (green), Law 98-03 (purple), and its articles (orange). It further elucidates the connections among various articles, extending to three hierarchical levels from Law 98-03's articles.
  • ...and 3 more figures