Table of Contents
Fetching ...

TathyaNyaya and FactLegalLlama: Advancing Factual Judgment Prediction and Explanation in the Indian Legal Context

Shubham Kumar Nigam, Balaramamahanthi Deepak Patnaik, Shivam Mishra, Noel Shallum, Kripabandhu Ghosh, Arnab Bhattacharya

TL;DR

This work introduces TathyaNyaya, the largest fact-centric dataset for factual Judgment Prediction and Explanation (FJPE) in the Indian legal context, drawing judgments from the Supreme Court and High Courts. It couples this dataset with FactLegalLlama, an instruction-tuned LLaMa-3-8B model trained on NyayaFacts to deliver both prediction and grounded explanations from factual inputs. The paper demonstrates that explicit fact extraction, paraphrasing, and fact/non-fact retrieval (NyayaFacts, NyayaScrape, NyayaSimplify, NyayaFilter) improve explainability and, in some settings, predictive performance, while expert evaluations affirm the quality of the explanations. Overall, the work highlights the importance of domain-specific, fact-focused tuning and preprocessing for transparent AI-assisted legal decision-making, and it provides a foundation for future cross-jurisdictional and multilingual extensions.

Abstract

In the landscape of Fact-based Judgment Prediction and Explanation (FJPE), reliance on factual data is essential for developing robust and realistic AI-driven decision-making tools. This paper introduces TathyaNyaya, the largest annotated dataset for FJPE tailored to the Indian legal context, encompassing judgments from the Supreme Court of India and various High Courts. Derived from the Hindi terms "Tathya" (fact) and "Nyaya" (justice), the TathyaNyaya dataset is uniquely designed to focus on factual statements rather than complete legal texts, reflecting real-world judicial processes where factual data drives outcomes. Complementing this dataset, we present FactLegalLlama, an instruction-tuned variant of the LLaMa-3-8B Large Language Model (LLM), optimized for generating high-quality explanations in FJPE tasks. Finetuned on the factual data in TathyaNyaya, FactLegalLlama integrates predictive accuracy with coherent, contextually relevant explanations, addressing the critical need for transparency and interpretability in AI-assisted legal systems. Our methodology combines transformers for binary judgment prediction with FactLegalLlama for explanation generation, creating a robust framework for advancing FJPE in the Indian legal domain. TathyaNyaya not only surpasses existing datasets in scale and diversity but also establishes a benchmark for building explainable AI systems in legal analysis. The findings underscore the importance of factual precision and domain-specific tuning in enhancing predictive performance and interpretability, positioning TathyaNyaya and FactLegalLlama as foundational resources for AI-assisted legal decision-making.

TathyaNyaya and FactLegalLlama: Advancing Factual Judgment Prediction and Explanation in the Indian Legal Context

TL;DR

This work introduces TathyaNyaya, the largest fact-centric dataset for factual Judgment Prediction and Explanation (FJPE) in the Indian legal context, drawing judgments from the Supreme Court and High Courts. It couples this dataset with FactLegalLlama, an instruction-tuned LLaMa-3-8B model trained on NyayaFacts to deliver both prediction and grounded explanations from factual inputs. The paper demonstrates that explicit fact extraction, paraphrasing, and fact/non-fact retrieval (NyayaFacts, NyayaScrape, NyayaSimplify, NyayaFilter) improve explainability and, in some settings, predictive performance, while expert evaluations affirm the quality of the explanations. Overall, the work highlights the importance of domain-specific, fact-focused tuning and preprocessing for transparent AI-assisted legal decision-making, and it provides a foundation for future cross-jurisdictional and multilingual extensions.

Abstract

In the landscape of Fact-based Judgment Prediction and Explanation (FJPE), reliance on factual data is essential for developing robust and realistic AI-driven decision-making tools. This paper introduces TathyaNyaya, the largest annotated dataset for FJPE tailored to the Indian legal context, encompassing judgments from the Supreme Court of India and various High Courts. Derived from the Hindi terms "Tathya" (fact) and "Nyaya" (justice), the TathyaNyaya dataset is uniquely designed to focus on factual statements rather than complete legal texts, reflecting real-world judicial processes where factual data drives outcomes. Complementing this dataset, we present FactLegalLlama, an instruction-tuned variant of the LLaMa-3-8B Large Language Model (LLM), optimized for generating high-quality explanations in FJPE tasks. Finetuned on the factual data in TathyaNyaya, FactLegalLlama integrates predictive accuracy with coherent, contextually relevant explanations, addressing the critical need for transparency and interpretability in AI-assisted legal systems. Our methodology combines transformers for binary judgment prediction with FactLegalLlama for explanation generation, creating a robust framework for advancing FJPE in the Indian legal domain. TathyaNyaya not only surpasses existing datasets in scale and diversity but also establishes a benchmark for building explainable AI systems in legal analysis. The findings underscore the importance of factual precision and domain-specific tuning in enhancing predictive performance and interpretability, positioning TathyaNyaya and FactLegalLlama as foundational resources for AI-assisted legal decision-making.

Paper Structure

This paper contains 42 sections, 5 figures, 9 tables.

Figures (5)

  • Figure 1: A high-level illustration of the TathyaNyaya dataset creation pipeline, showcasing the development process and interconnections of its four components.
  • Figure 2: Illustration of the Fact-based Judgment Prediction and Explanation (FJPE) pipeline using the FactLegalLlama model.
  • Figure 3: Training dynamics of FactLegalLlama for the combined judgment prediction and explanation task. The model learns to produce both the outcome and its underlying rationale directly from factual inputs, guided by instruction-based fine-tuning.
  • Figure 4: Overview of the simplification and fine-tuning process. First, complex legal facts are paraphrased into simpler language using LLaMA-3-70B, creating the NyayaSimplify dataset, followed by supervised fine-tuning (SFT) using LLaMa-3-7B for the FJPE task.
  • Figure 5: The Fact vs. Non-Fact segmentation framework employing a BiLSTM-CRF model. This segmentation step separates factual statements from non-factual content in legal judgments, creating the NyayaFilter dataset. The refined dataset is subsequently used for downstream judgment prediction and explanation tasks.