TathyaNyaya and FactLegalLlama: Advancing Factual Judgment Prediction and Explanation in the Indian Legal Context
Shubham Kumar Nigam, Balaramamahanthi Deepak Patnaik, Shivam Mishra, Noel Shallum, Kripabandhu Ghosh, Arnab Bhattacharya
TL;DR
This work introduces TathyaNyaya, the largest fact-centric dataset for factual Judgment Prediction and Explanation (FJPE) in the Indian legal context, drawing judgments from the Supreme Court and High Courts. It couples this dataset with FactLegalLlama, an instruction-tuned LLaMa-3-8B model trained on NyayaFacts to deliver both prediction and grounded explanations from factual inputs. The paper demonstrates that explicit fact extraction, paraphrasing, and fact/non-fact retrieval (NyayaFacts, NyayaScrape, NyayaSimplify, NyayaFilter) improve explainability and, in some settings, predictive performance, while expert evaluations affirm the quality of the explanations. Overall, the work highlights the importance of domain-specific, fact-focused tuning and preprocessing for transparent AI-assisted legal decision-making, and it provides a foundation for future cross-jurisdictional and multilingual extensions.
Abstract
In the landscape of Fact-based Judgment Prediction and Explanation (FJPE), reliance on factual data is essential for developing robust and realistic AI-driven decision-making tools. This paper introduces TathyaNyaya, the largest annotated dataset for FJPE tailored to the Indian legal context, encompassing judgments from the Supreme Court of India and various High Courts. Derived from the Hindi terms "Tathya" (fact) and "Nyaya" (justice), the TathyaNyaya dataset is uniquely designed to focus on factual statements rather than complete legal texts, reflecting real-world judicial processes where factual data drives outcomes. Complementing this dataset, we present FactLegalLlama, an instruction-tuned variant of the LLaMa-3-8B Large Language Model (LLM), optimized for generating high-quality explanations in FJPE tasks. Finetuned on the factual data in TathyaNyaya, FactLegalLlama integrates predictive accuracy with coherent, contextually relevant explanations, addressing the critical need for transparency and interpretability in AI-assisted legal systems. Our methodology combines transformers for binary judgment prediction with FactLegalLlama for explanation generation, creating a robust framework for advancing FJPE in the Indian legal domain. TathyaNyaya not only surpasses existing datasets in scale and diversity but also establishes a benchmark for building explainable AI systems in legal analysis. The findings underscore the importance of factual precision and domain-specific tuning in enhancing predictive performance and interpretability, positioning TathyaNyaya and FactLegalLlama as foundational resources for AI-assisted legal decision-making.
