Multi-Source Retrieval and Reasoning for Legal Sentencing Prediction
Junjie Chen, Haitao Li, Qilei Zhang, Zhenghua Li, Ya Zhang, Quan Zhou, Cheng Luo, Yiqun Liu, Dongsheng Guo, Qingyao Ai
TL;DR
This work tackles Legal Sentencing Prediction (LSP) by introducing $MSR^2$, a reinforcement learning–driven framework that performs routable multi-source retrieval and employs a process-level reward to supervise intermediate reasoning. By integrating sources such as statutes, judicial interpretations, and sentencing guidelines directly into the reasoning process, the model achieves more accurate and interpretable sentencing decisions than prior semantic- or logic-focused approaches. Training with Group Relative Policy Optimization (GRPO) further stabilizes learning by using group-based baselines and masking retrieved tokens during optimization. Experiments on CAIL2018 and CJO22 demonstrate state-of-the-art performance and provide qualitative case studies showing transparent, source-grounded reasoning paths with tangible improvements in both prediction accuracy and interpretability.
Abstract
Legal judgment prediction (LJP) aims to predict judicial outcomes from case facts and typically includes law article, charge, and sentencing prediction. While recent methods perform well on the first two subtasks, legal sentencing prediction (LSP) remains difficult due to its need for fine-grained objective knowledge and flexible subjective reasoning. To address these limitations, we propose $MSR^2$, a framework that integrates multi-source retrieval and reasoning in LLMs with reinforcement learning. $MSR^2$ enables LLMs to perform multi-source retrieval based on reasoning needs and applies a process-level reward to guide intermediate subjective reasoning steps. Experiments on two real-world datasets show that $MSR^2$ improves both accuracy and interpretability in LSP, providing a promising step toward practical legal AI. Our code is available at https://anonymous.4open.science/r/MSR2-FC3B.
