Table of Contents
Fetching ...

Leveraging Deep Learning with Multi-Head Attention for Accurate Extraction of Medicine from Handwritten Prescriptions

Usman Ali, Sahil Ranmbail, Muhammad Nadeem, Hamid Ishfaq, Muhammad Umer Ramzan, Waqas Ali

TL;DR

This study addresses extracting medicine names from handwritten prescriptions, a task hindered by diverse handwriting styles and formats. It proposes a two-stage hybrid approach that combines Mask R-CNN for region segmentation with TrOCR, a Transformer-based OCR, for text transcription, followed by Levenshtein and fuzzy matching against a medicines database. The model is fine-tuned on a novel Pakistan-origin dataset of approximately 1,000 prescriptions from 50 doctors, augmented to 9,920 samples to cover variability. Empirical results show a character error rate of $CER = 1.4\%$ on standard benchmarks, demonstrating robust recognition and potential to automate medicine-name extraction in clinical workflows.

Abstract

Extracting medication names from handwritten doctor prescriptions is challenging due to the wide variability in handwriting styles and prescription formats. This paper presents a robust method for extracting medicine names using a combination of Mask R-CNN and Transformer-based Optical Character Recognition (TrOCR) with Multi-Head Attention and Positional Embeddings. A novel dataset, featuring diverse handwritten prescriptions from various regions of Pakistan, was utilized to fine-tune the model on different handwriting styles. The Mask R-CNN model segments the prescription images to focus on the medicinal sections, while the TrOCR model, enhanced by Multi-Head Attention and Positional Embeddings, transcribes the isolated text. The transcribed text is then matched against a pre-existing database for accurate identification. The proposed approach achieved a character error rate (CER) of 1.4% on standard benchmarks, highlighting its potential as a reliable and efficient tool for automating medicine name extraction.

Leveraging Deep Learning with Multi-Head Attention for Accurate Extraction of Medicine from Handwritten Prescriptions

TL;DR

This study addresses extracting medicine names from handwritten prescriptions, a task hindered by diverse handwriting styles and formats. It proposes a two-stage hybrid approach that combines Mask R-CNN for region segmentation with TrOCR, a Transformer-based OCR, for text transcription, followed by Levenshtein and fuzzy matching against a medicines database. The model is fine-tuned on a novel Pakistan-origin dataset of approximately 1,000 prescriptions from 50 doctors, augmented to 9,920 samples to cover variability. Empirical results show a character error rate of on standard benchmarks, demonstrating robust recognition and potential to automate medicine-name extraction in clinical workflows.

Abstract

Extracting medication names from handwritten doctor prescriptions is challenging due to the wide variability in handwriting styles and prescription formats. This paper presents a robust method for extracting medicine names using a combination of Mask R-CNN and Transformer-based Optical Character Recognition (TrOCR) with Multi-Head Attention and Positional Embeddings. A novel dataset, featuring diverse handwritten prescriptions from various regions of Pakistan, was utilized to fine-tune the model on different handwriting styles. The Mask R-CNN model segments the prescription images to focus on the medicinal sections, while the TrOCR model, enhanced by Multi-Head Attention and Positional Embeddings, transcribes the isolated text. The transcribed text is then matched against a pre-existing database for accurate identification. The proposed approach achieved a character error rate (CER) of 1.4% on standard benchmarks, highlighting its potential as a reliable and efficient tool for automating medicine name extraction.

Paper Structure

This paper contains 9 sections, 19 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Annotated Image
  • Figure 2: Overall Workflow
  • Figure 3: Segmentation of medicinal areas using Mask R-CNN