Table of Contents
Fetching ...

Seeing Justice Clearly: Handwritten Legal Document Translation with OCR and Vision-Language Models

Shubham Kumar Nigam, Parjanya Aditya Shukla, Noel Shallum, Arnab Bhattacharya

TL;DR

The study benchmarks two paradigms for translating handwritten Marathi legal documents into English: modular OCR followed by machine translation (OCR-MT) and end-to-end Vision-Language Models (vLLMs) that translate directly from handwritten images. Using a curated ~60-document Marathi legal corpus with ground-truth English translations, the authors evaluate OCR tools (Tesseract, EasyOCR, PaddleOCR) with IndicTrans2 and Sarvam-1, and compare vLLMs (Chitrarth, Ovis2, Maya-8B) in zero-shot prompts. Findings show OCR-MT suffers from cascading errors due to recognition mistakes, while vLLMs offer promising end-to-end translation but still struggle with precise legal semantics and reliability, highly depending on prompting. The work highlights a path toward robust, edge-deployable legal digitization systems, recommending hybrid approaches, domain-specific fine-tuning, and expanded datasets for real-world deployment in India's judiciary.

Abstract

Handwritten text recognition (HTR) and machine translation continue to pose significant challenges, particularly for low-resource languages like Marathi, which lack large digitized corpora and exhibit high variability in handwriting styles. The conventional approach to address this involves a two-stage pipeline: an OCR system extracts text from handwritten images, which is then translated into the target language using a machine translation model. In this work, we explore and compare the performance of traditional OCR-MT pipelines with Vision Large Language Models that aim to unify these stages and directly translate handwritten text images in a single, end-to-end step. Our motivation is grounded in the urgent need for scalable, accurate translation systems to digitize legal records such as FIRs, charge sheets, and witness statements in India's district and high courts. We evaluate both approaches on a curated dataset of handwritten Marathi legal documents, with the goal of enabling efficient legal document processing, even in low-resource environments. Our findings offer actionable insights toward building robust, edge-deployable solutions that enhance access to legal information for non-native speakers and legal professionals alike.

Seeing Justice Clearly: Handwritten Legal Document Translation with OCR and Vision-Language Models

TL;DR

The study benchmarks two paradigms for translating handwritten Marathi legal documents into English: modular OCR followed by machine translation (OCR-MT) and end-to-end Vision-Language Models (vLLMs) that translate directly from handwritten images. Using a curated ~60-document Marathi legal corpus with ground-truth English translations, the authors evaluate OCR tools (Tesseract, EasyOCR, PaddleOCR) with IndicTrans2 and Sarvam-1, and compare vLLMs (Chitrarth, Ovis2, Maya-8B) in zero-shot prompts. Findings show OCR-MT suffers from cascading errors due to recognition mistakes, while vLLMs offer promising end-to-end translation but still struggle with precise legal semantics and reliability, highly depending on prompting. The work highlights a path toward robust, edge-deployable legal digitization systems, recommending hybrid approaches, domain-specific fine-tuning, and expanded datasets for real-world deployment in India's judiciary.

Abstract

Handwritten text recognition (HTR) and machine translation continue to pose significant challenges, particularly for low-resource languages like Marathi, which lack large digitized corpora and exhibit high variability in handwriting styles. The conventional approach to address this involves a two-stage pipeline: an OCR system extracts text from handwritten images, which is then translated into the target language using a machine translation model. In this work, we explore and compare the performance of traditional OCR-MT pipelines with Vision Large Language Models that aim to unify these stages and directly translate handwritten text images in a single, end-to-end step. Our motivation is grounded in the urgent need for scalable, accurate translation systems to digitize legal records such as FIRs, charge sheets, and witness statements in India's district and high courts. We evaluate both approaches on a curated dataset of handwritten Marathi legal documents, with the goal of enabling efficient legal document processing, even in low-resource environments. Our findings offer actionable insights toward building robust, edge-deployable solutions that enhance access to legal information for non-native speakers and legal professionals alike.

Paper Structure

This paper contains 11 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Comparison of OCR-MT and vLLM-based approaches for handwritten text translation. The OCR-MT pipeline decomposes the task into separate HTR and MT stages, whereas vLLMs unify the process into a single end-to-end step.
  • Figure 2: Accurate recognition of printed and handwritten Marathi numeric characters.
  • Figure 3: Incorrect extraction of handwritten dates.
  • Figure 4: Correct extraction by OCR models for Marathi.
  • Figure 5: Correct extraction of Marathi stamp details.