Human Centered AI for Indian Legal Text Analytics
Sudipto Ghosh, Devanshu Verma, Balaji Ganesan, Purnima Bindal, Vikas Kumar, Vasudha Bhatnagar
TL;DR
This paper argues that powerful LLMs alone fall short in delivering trustworthy legal AI, especially for India, and proposes a human-centered, compound AI system for Legal Text Analytics (LTA). It introduces a new Indian legal dataset and a legal knowledge graph, and presents five LTA tasks—Case Similarity, Judgment Summarization, Petition Drafting, Question Answering, and Text2SQL—each designed to leverage human input and external knowledge to reduce hallucinations and improve usefulness for practitioners and self-represented litigants. The work details an InLegalLLaMA research direction, combining domain-specific pre-training, knowledge infusion, and instruction-tuning to build a robust Indian legal model; it also outlines practical components like a PIL-guideline-driven petition drafting workflow and retrieval-augmented generation for evidence-backed queries. Overall, the approach aims to democratize legal knowledge and speed justice by aligning AI capabilities with human expertise and real-world legal processes.
Abstract
Legal research is a crucial task in the practice of law. It requires intense human effort and intellectual prudence to research a legal case and prepare arguments. Recent boom in generative AI has not translated to proportionate rise in impactful legal applications, because of low trustworthiness and and the scarcity of specialized datasets for training Large Language Models (LLMs). This position paper explores the potential of LLMs within Legal Text Analytics (LTA), highlighting specific areas where the integration of human expertise can significantly enhance their performance to match that of experts. We introduce a novel dataset and describe a human centered, compound AI system that principally incorporates human inputs for performing LTA tasks with LLMs.
