Structured Definitions and Segmentations for Legal Reasoning in LLMs: A Study on Indian Legal Data
Mann Khatri, Mirza Yusuf, Rajiv Ratn Shah, Ponnurangam Kumaraguru
TL;DR
This paper tackles domain adaptation for legal judgment prediction (LJP) in Indian data by probing prompt engineering strategies that structure documents and reasoning. It introduces three components—D (definitions of legal terms), R (rhetorical roles to segment text), and C (court-like chain reasoning)—and evaluates their zero-shot impact across multiple Indian datasets, including Kalamkar/Bambroo corpora and Predex. Key findings show that defining legal terms and segmenting documents often improve $F1$ by around 1.5–4.36 percentage points, while combining all components does not always yield the best results; performance is model- and dataset-dependent. The work demonstrates that carefully designed, structure-aware prompts enable stronger zero-shot LJP from LLMs like Llama and o3-mini, offering practical guidance for deployment and a foundation for future few-shot or fine-tuning efforts in the legal domain.
Abstract
Large Language Models (LLMs), trained on extensive datasets from the web, exhibit remarkable general reasoning skills. Despite this, they often struggle in specialized areas like law, mainly because they lack domain-specific pretraining. The legal field presents unique challenges, as legal documents are generally long and intricate, making it hard for models to process the full text efficiently. Previous studies have examined in-context approaches to address the knowledge gap, boosting model performance in new domains without full domain alignment. In our paper, we analyze model behavior on legal tasks by conducting experiments in three areas: (i) reorganizing documents based on rhetorical roles to assess how structured information affects long context processing and model decisions, (ii) defining rhetorical roles to familiarize the model with legal terminology, and (iii) emulating the step-by-step reasoning of courts regarding rhetorical roles to enhance model reasoning. These experiments are conducted in a zero-shot setting across three Indian legal judgment prediction datasets. Our results reveal that organizing data or explaining key legal terms significantly boosts model performance, with a minimum increase of ~1.5% and a maximum improvement of 4.36% in F1 score compared to the baseline.
