Leveraging Deep Learning with Multi-Head Attention for Accurate Extraction of Medicine from Handwritten Prescriptions
Usman Ali, Sahil Ranmbail, Muhammad Nadeem, Hamid Ishfaq, Muhammad Umer Ramzan, Waqas Ali
TL;DR
This study addresses extracting medicine names from handwritten prescriptions, a task hindered by diverse handwriting styles and formats. It proposes a two-stage hybrid approach that combines Mask R-CNN for region segmentation with TrOCR, a Transformer-based OCR, for text transcription, followed by Levenshtein and fuzzy matching against a medicines database. The model is fine-tuned on a novel Pakistan-origin dataset of approximately 1,000 prescriptions from 50 doctors, augmented to 9,920 samples to cover variability. Empirical results show a character error rate of $CER = 1.4\%$ on standard benchmarks, demonstrating robust recognition and potential to automate medicine-name extraction in clinical workflows.
Abstract
Extracting medication names from handwritten doctor prescriptions is challenging due to the wide variability in handwriting styles and prescription formats. This paper presents a robust method for extracting medicine names using a combination of Mask R-CNN and Transformer-based Optical Character Recognition (TrOCR) with Multi-Head Attention and Positional Embeddings. A novel dataset, featuring diverse handwritten prescriptions from various regions of Pakistan, was utilized to fine-tune the model on different handwriting styles. The Mask R-CNN model segments the prescription images to focus on the medicinal sections, while the TrOCR model, enhanced by Multi-Head Attention and Positional Embeddings, transcribes the isolated text. The transcribed text is then matched against a pre-existing database for accurate identification. The proposed approach achieved a character error rate (CER) of 1.4% on standard benchmarks, highlighting its potential as a reliable and efficient tool for automating medicine name extraction.
