Table of Contents
Fetching ...

MIRAGE: Multimodal Identification and Recognition of Annotations in Indian General Prescriptions

Tavish Mankash, V. S. Chaithanya Kota, Anish De, Praveen Prakash, Kshitij Jadhav

TL;DR

This work focuses on extracting medication names and dosages from simulated medical records in India by fine-tuning the QWEN VL, LLaVA 1.6, Idefics2 and Idefics2 models and achieves 82% accuracy in extracting medication names and dosages.

Abstract

Hospitals in India still rely on handwritten medical records despite the availability of Electronic Medical Records (EMR), complicating statistical analysis and record retrieval. Handwritten records pose a unique challenge, requiring specialized data for training models to recognize medications and their recommendation patterns. While traditional handwriting recognition approaches employ 2-D LSTMs, recent studies have explored using Multimodal Large Language Models (MLLMs) for OCR tasks. Building on this approach, we focus on extracting medication names and dosages from simulated medical records. Our methodology MIRAGE (Multimodal Identification and Recognition of Annotations in indian GEneral prescriptions) involves fine-tuning the QWEN VL, LLaVA 1.6 and Idefics2 models on 743,118 high resolution simulated medical record images-fully annotated from 1,133 doctors across India. Our approach achieves 82% accuracy in extracting medication names and dosages.

MIRAGE: Multimodal Identification and Recognition of Annotations in Indian General Prescriptions

TL;DR

This work focuses on extracting medication names and dosages from simulated medical records in India by fine-tuning the QWEN VL, LLaVA 1.6, Idefics2 and Idefics2 models and achieves 82% accuracy in extracting medication names and dosages.

Abstract

Hospitals in India still rely on handwritten medical records despite the availability of Electronic Medical Records (EMR), complicating statistical analysis and record retrieval. Handwritten records pose a unique challenge, requiring specialized data for training models to recognize medications and their recommendation patterns. While traditional handwriting recognition approaches employ 2-D LSTMs, recent studies have explored using Multimodal Large Language Models (MLLMs) for OCR tasks. Building on this approach, we focus on extracting medication names and dosages from simulated medical records. Our methodology MIRAGE (Multimodal Identification and Recognition of Annotations in indian GEneral prescriptions) involves fine-tuning the QWEN VL, LLaVA 1.6 and Idefics2 models on 743,118 high resolution simulated medical record images-fully annotated from 1,133 doctors across India. Our approach achieves 82% accuracy in extracting medication names and dosages.

Paper Structure

This paper contains 10 sections, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Difference between digital and paper-written handwriting. (a) From tabassum2021recognition by Tabassum et al., reprinted with permission © 2021 IEEE (b) Typical prescribed medication from our dataset.
  • Figure 2: (a) Tilted text complicates word isolation. Chronological order would be incomprehensible without understanding sectioning
  • Figure 3: Medicine name distribution, with the most frequent on the left. Percentage frequency is in log scale.
  • Figure 4: Model precision when identifying the top 'N' most frequently prescribed medications per doctor.
  • Figure 5: LLaVA architecture. Source: Liu et al.liu2023llava
  • ...and 4 more figures