Table of Contents
Fetching ...

ALKAFI-LLAMA3: Fine-Tuning LLMs for Precise Legal Understanding in Palestine

Rabee Qasem, Mohannad Hendi, Banan Tantour

TL;DR

The paper tackles the scarcity of AI-enabled legal guidance in Palestine by proposing a cost-effective workflow: fine-tuning a 4-bit quantized Llama-3.2-1B-Instruct model on a synthetic Palestinian legal dataset derived from Official Gazette texts. It demonstrates that a compact model can be trained locally to provide contextually relevant legal interpretations, supported by a sizable Arabic QA dataset (around 243,841 records, ~5 million words) generated from legal articles. Training results show decreasing loss trends, with final train and eval losses of $0.33$ and $0.31$, respectively, indicating promising learning and adherence to legal content. The work contributes a publicly available Palestinian legal dataset, a replication-friendly fine-tuning pipeline, and a pathway for deploying AI-assisted legal tools in resource-constrained settings, while noting challenges in calculation-based queries and structured list formatting.

Abstract

Large Language Models (LLMs) have demonstrated remarkable potential in diverse domains, yet their application in the legal sector, particularly in low-resource contexts, remains limited. This study addresses the challenges of adapting LLMs to the Palestinian legal domain, where political instability, fragmented legal frameworks, and limited AI resources hinder effective machine-learning applications. We present a fine-tuned model based on a quantized version of Llama-3.2-1B-Instruct, trained on a synthetic data set derived from Palestinian legal texts. Using smaller-scale models and strategically generated question-answer pairs, we achieve a cost-effective, locally sustainable solution that provides accurate and contextually relevant legal guidance. Our experiments demonstrate promising performance on various query types, ranging from yes/no questions and narrative explanations to complex legal differentiations, while highlighting areas for improvement, such as handling calculation-based inquiries and structured list formatting. This work provides a pathway for the deployment of AI-driven legal assistance tools tailored to the needs of resource-constrained environments.

ALKAFI-LLAMA3: Fine-Tuning LLMs for Precise Legal Understanding in Palestine

TL;DR

The paper tackles the scarcity of AI-enabled legal guidance in Palestine by proposing a cost-effective workflow: fine-tuning a 4-bit quantized Llama-3.2-1B-Instruct model on a synthetic Palestinian legal dataset derived from Official Gazette texts. It demonstrates that a compact model can be trained locally to provide contextually relevant legal interpretations, supported by a sizable Arabic QA dataset (around 243,841 records, ~5 million words) generated from legal articles. Training results show decreasing loss trends, with final train and eval losses of and , respectively, indicating promising learning and adherence to legal content. The work contributes a publicly available Palestinian legal dataset, a replication-friendly fine-tuning pipeline, and a pathway for deploying AI-assisted legal tools in resource-constrained settings, while noting challenges in calculation-based queries and structured list formatting.

Abstract

Large Language Models (LLMs) have demonstrated remarkable potential in diverse domains, yet their application in the legal sector, particularly in low-resource contexts, remains limited. This study addresses the challenges of adapting LLMs to the Palestinian legal domain, where political instability, fragmented legal frameworks, and limited AI resources hinder effective machine-learning applications. We present a fine-tuned model based on a quantized version of Llama-3.2-1B-Instruct, trained on a synthetic data set derived from Palestinian legal texts. Using smaller-scale models and strategically generated question-answer pairs, we achieve a cost-effective, locally sustainable solution that provides accurate and contextually relevant legal guidance. Our experiments demonstrate promising performance on various query types, ranging from yes/no questions and narrative explanations to complex legal differentiations, while highlighting areas for improvement, such as handling calculation-based inquiries and structured list formatting. This work provides a pathway for the deployment of AI-driven legal assistance tools tailored to the needs of resource-constrained environments.

Paper Structure

This paper contains 20 sections, 10 figures, 1 table.

Figures (10)

  • Figure 1: Boxplot of the token length with 90th Percentile.
  • Figure 2: Distribution of the token length across the dataset.
  • Figure 3: Training loss progression plotted against global steps during the training process.
  • Figure 4: Boxplot illustrating the distribution of training loss across epochs.
  • Figure 5: Evaluation loss progression at the end of each epoch.
  • ...and 5 more figures