Table of Contents
Fetching ...

Judging by Appearances? Auditing and Intervening Vision-Language Models for Bail Prediction

Sagnik Basu, Shubham Prakash, Ashish Maruti Barge, Siddharth D Jaiswal, Abhisek Dash, Saptarshi Ghosh, Animesh Mukherjee

TL;DR

This work studies bail prediction as a high-stakes multimodal task by pairing mugshot images with case texts to audit vision-language models (VLMs). It introduces two interventions—a retrieval-augmented generation (RAG) framework to inject precedents and specialized fine-tuning with typed-facts—and evaluates their effect across intersectional groups. Base VLMs exhibit poor performance and high confidence in incorrect denials, but the interventions yield substantial gains, with several configurations reaching and exceeding 70 percent accuracy. The findings clarify both the potential and the limits of VLMs in legal AI, underscoring the need for human oversight, responsible deployment, and regulatory safeguards while offering a practical pathway toward safer assistive use in courts.

Abstract

Large language models (LLMs) have been extensively used for legal judgment prediction tasks based on case reports and crime history. However, with a surge in the availability of large vision language models (VLMs), legal judgment prediction systems can now be made to leverage the images of the criminals in addition to the textual case reports/crime history. Applications built in this way could lead to inadvertent consequences and be used with malicious intent. In this work, we run an audit to investigate the efficiency of standalone VLMs in the bail decision prediction task. We observe that the performance is poor across multiple intersectional groups and models \textit{wrongly deny bail to deserving individuals with very high confidence}. We design different intervention algorithms by first including legal precedents through a RAG pipeline and then fine-tuning the VLMs using innovative schemes. We demonstrate that these interventions substantially improve the performance of bail prediction. Our work paves the way for the design of smarter interventions on VLMs in the future, before they can be deployed for real-world legal judgment prediction.

Judging by Appearances? Auditing and Intervening Vision-Language Models for Bail Prediction

TL;DR

This work studies bail prediction as a high-stakes multimodal task by pairing mugshot images with case texts to audit vision-language models (VLMs). It introduces two interventions—a retrieval-augmented generation (RAG) framework to inject precedents and specialized fine-tuning with typed-facts—and evaluates their effect across intersectional groups. Base VLMs exhibit poor performance and high confidence in incorrect denials, but the interventions yield substantial gains, with several configurations reaching and exceeding 70 percent accuracy. The findings clarify both the potential and the limits of VLMs in legal AI, underscoring the need for human oversight, responsible deployment, and regulatory safeguards while offering a practical pathway toward safer assistive use in courts.

Abstract

Large language models (LLMs) have been extensively used for legal judgment prediction tasks based on case reports and crime history. However, with a surge in the availability of large vision language models (VLMs), legal judgment prediction systems can now be made to leverage the images of the criminals in addition to the textual case reports/crime history. Applications built in this way could lead to inadvertent consequences and be used with malicious intent. In this work, we run an audit to investigate the efficiency of standalone VLMs in the bail decision prediction task. We observe that the performance is poor across multiple intersectional groups and models \textit{wrongly deny bail to deserving individuals with very high confidence}. We design different intervention algorithms by first including legal precedents through a RAG pipeline and then fine-tuning the VLMs using innovative schemes. We demonstrate that these interventions substantially improve the performance of bail prediction. Our work paves the way for the design of smarter interventions on VLMs in the future, before they can be deployed for real-world legal judgment prediction.

Paper Structure

This paper contains 22 sections, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Overall fine-tuning design. For vanilla setup typed-facts are not added; however, in offense type induced setup, the whole architecture is in use.