Table of Contents
Fetching ...

ReGal: A First Look at PPO-based Legal AI for Judgment Prediction and Summarization in India

Shubham Kumar Nigam, Tanuj Tyagi, Siddharth Shukla, Aditya Kumar Guru, Balaramamahanthi Deepak Patnaik, Danush Khanna, Noel Shallum, Kripabandhu Ghosh, Arnab Bhattacharya

TL;DR

This work pilots a PPO-based reinforcement learning approach (ReGal) for Indian legal NLP, applying RLHF/RLAIF to two core tasks: Court Judgment Prediction with rationale explanations and abstractive legal summarization. By pretraining with supervised instruction tuning on CJPE and In-Abs, then fine-tuning with PPO using AI-generated rewards, the study investigates RL's capacity to align outputs with legal reasoning. The results show ReGal underperforms compared to strong supervised baselines and commercial LLMs, revealing challenges in reward design, domain adaptation, and factual fidelity, and highlighting hallmarks for future improvements. The paper provides valuable methodological insights and a foundation for building more robust, interpretable, and domain-adapted RL-driven legal AI pipelines.

Abstract

This paper presents an early exploration of reinforcement learning methodologies for legal AI in the Indian context. We introduce Reinforcement Learning-based Legal Reasoning (ReGal), a framework that integrates Multi-Task Instruction Tuning with Reinforcement Learning from AI Feedback (RLAIF) using Proximal Policy Optimization (PPO). Our approach is evaluated across two critical legal tasks: (i) Court Judgment Prediction and Explanation (CJPE), and (ii) Legal Document Summarization. Although the framework underperforms on standard evaluation metrics compared to supervised and proprietary models, it provides valuable insights into the challenges of applying RL to legal texts. These challenges include reward model alignment, legal language complexity, and domain-specific adaptation. Through empirical and qualitative analysis, we demonstrate how RL can be repurposed for high-stakes, long-document tasks in law. Our findings establish a foundation for future work on optimizing legal reasoning pipelines using reinforcement learning, with broader implications for building interpretable and adaptive legal AI systems.

ReGal: A First Look at PPO-based Legal AI for Judgment Prediction and Summarization in India

TL;DR

This work pilots a PPO-based reinforcement learning approach (ReGal) for Indian legal NLP, applying RLHF/RLAIF to two core tasks: Court Judgment Prediction with rationale explanations and abstractive legal summarization. By pretraining with supervised instruction tuning on CJPE and In-Abs, then fine-tuning with PPO using AI-generated rewards, the study investigates RL's capacity to align outputs with legal reasoning. The results show ReGal underperforms compared to strong supervised baselines and commercial LLMs, revealing challenges in reward design, domain adaptation, and factual fidelity, and highlighting hallmarks for future improvements. The paper provides valuable methodological insights and a foundation for building more robust, interpretable, and domain-adapted RL-driven legal AI pipelines.

Abstract

This paper presents an early exploration of reinforcement learning methodologies for legal AI in the Indian context. We introduce Reinforcement Learning-based Legal Reasoning (ReGal), a framework that integrates Multi-Task Instruction Tuning with Reinforcement Learning from AI Feedback (RLAIF) using Proximal Policy Optimization (PPO). Our approach is evaluated across two critical legal tasks: (i) Court Judgment Prediction and Explanation (CJPE), and (ii) Legal Document Summarization. Although the framework underperforms on standard evaluation metrics compared to supervised and proprietary models, it provides valuable insights into the challenges of applying RL to legal texts. These challenges include reward model alignment, legal language complexity, and domain-specific adaptation. Through empirical and qualitative analysis, we demonstrate how RL can be repurposed for high-stakes, long-document tasks in law. Our findings establish a foundation for future work on optimizing legal reasoning pipelines using reinforcement learning, with broader implications for building interpretable and adaptive legal AI systems.

Paper Structure

This paper contains 25 sections, 1 equation, 1 figure, 4 tables.

Figures (1)

  • Figure 1: Overview of the ReGal PPO model training process.