Table of Contents
Fetching ...

BayesJudge: Bayesian Kernel Language Modelling with Confidence Uncertainty in Legal Judgment Prediction

Ubaid Azam, Imran Razzak, Shelly Vishwakarma, Hakim Hacid, Dell Zhang, Shoaib Jameel

TL;DR

This paper introduces BayesJudge, a Bayesian kernel Monte Carlo dropout framework for legal judgment prediction that explicitly quantifies prediction uncertainty. By integrating kernel-based data modeling with Beta priors over dropout probabilities, BayesJudge delivers calibrated predictive distributions and improved performance on public legal datasets, including ECHR and Overruling, across zero- to full-data regimes and under adversarial paraphrasing. The approach enables automatic identification and automated scrutiny of unreliable predictions, with a practical optimal-solution technique that boosts accuracy up to 27% on uncertain cases through linguistically richer, legally styled text. These results highlight the potential for trustworthy, transparent legal AI that supports judges and practitioners, and suggest broader applicability to other high-stakes domains requiring calibrated uncertainty estimates.

Abstract

Predicting legal judgments with reliable confidence is paramount for responsible legal AI applications. While transformer-based deep neural networks (DNNs) like BERT have demonstrated promise in legal tasks, accurately assessing their prediction confidence remains crucial. We present a novel Bayesian approach called BayesJudge that harnesses the synergy between deep learning and deep Gaussian Processes to quantify uncertainty through Bayesian kernel Monte Carlo dropout. Our method leverages informative priors and flexible data modelling via kernels, surpassing existing methods in both predictive accuracy and confidence estimation as indicated through brier score. Extensive evaluations of public legal datasets showcase our model's superior performance across diverse tasks. We also introduce an optimal solution to automate the scrutiny of unreliable predictions, resulting in a significant increase in the accuracy of the model's predictions by up to 27\%. By empowering judges and legal professionals with more reliable information, our work paves the way for trustworthy and transparent legal AI applications that facilitate informed decisions grounded in both knowledge and quantified uncertainty.

BayesJudge: Bayesian Kernel Language Modelling with Confidence Uncertainty in Legal Judgment Prediction

TL;DR

This paper introduces BayesJudge, a Bayesian kernel Monte Carlo dropout framework for legal judgment prediction that explicitly quantifies prediction uncertainty. By integrating kernel-based data modeling with Beta priors over dropout probabilities, BayesJudge delivers calibrated predictive distributions and improved performance on public legal datasets, including ECHR and Overruling, across zero- to full-data regimes and under adversarial paraphrasing. The approach enables automatic identification and automated scrutiny of unreliable predictions, with a practical optimal-solution technique that boosts accuracy up to 27% on uncertain cases through linguistically richer, legally styled text. These results highlight the potential for trustworthy, transparent legal AI that supports judges and practitioners, and suggest broader applicability to other high-stakes domains requiring calibrated uncertainty estimates.

Abstract

Predicting legal judgments with reliable confidence is paramount for responsible legal AI applications. While transformer-based deep neural networks (DNNs) like BERT have demonstrated promise in legal tasks, accurately assessing their prediction confidence remains crucial. We present a novel Bayesian approach called BayesJudge that harnesses the synergy between deep learning and deep Gaussian Processes to quantify uncertainty through Bayesian kernel Monte Carlo dropout. Our method leverages informative priors and flexible data modelling via kernels, surpassing existing methods in both predictive accuracy and confidence estimation as indicated through brier score. Extensive evaluations of public legal datasets showcase our model's superior performance across diverse tasks. We also introduce an optimal solution to automate the scrutiny of unreliable predictions, resulting in a significant increase in the accuracy of the model's predictions by up to 27\%. By empowering judges and legal professionals with more reliable information, our work paves the way for trustworthy and transparent legal AI applications that facilitate informed decisions grounded in both knowledge and quantified uncertainty.
Paper Structure (14 sections, 7 equations, 3 figures, 4 tables)

This paper contains 14 sections, 7 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Accuracy comparison for Original vs Paraphrased test data results. We conducted simple paraphrasing to demonstrate how models will react to the real world when the text is simple. We showed results decreased as the text became simple which is expected.
  • Figure 2: Probability distribution and prediction results for Custom Legal-BERT for 15 Shot.
  • Figure 3: Accuracy and Brier score comparison of techniques on less certain predictions.