A Retrieval-Based Approach to Medical Procedure Matching in Romanian
Andrei Niculae, Adrian Cosma, Emilian Radoi
TL;DR
This work tackles the challenge of aligning Romanian medical procedure names with insurance-standardized codes by reframing the task as retrieval rather than multiclass classification. It evaluates dense and sparse embeddings, including a fine-tuned mE5 model, within a Milvus vector store, and demonstrates that dense, metric-learning-tuned representations significantly outperform BM25 and hybrid approaches. The best system achieves Acc@1 of 85.2% in a setup combining masterlist entries with clinic mappings, and a doctor-validated Acc@1 of 94.7% with a 1200x speedup over manual mapping, underscoring practical impact for private healthcare reimbursement workflows in Romanian. The results advance medical NLP for low-resource languages and suggest that robust, scalable retrieval-based matching can be extended to similar settings with limited language-specific medical resources.
Abstract
Accurately mapping medical procedure names from healthcare providers to standardized terminology used by insurance companies is a crucial yet complex task. Inconsistencies in naming conventions lead to missclasified procedures, causing administrative inefficiencies and insurance claim problems in private healthcare settings. Many companies still use human resources for manual mapping, while there is a clear opportunity for automation. This paper proposes a retrieval-based architecture leveraging sentence embeddings for medical name matching in the Romanian healthcare system. This challenge is significantly more difficult in underrepresented languages such as Romanian, where existing pretrained language models lack domain-specific adaptation to medical text. We evaluate multiple embedding models, including Romanian, multilingual, and medical-domain-specific representations, to identify the most effective solution for this task. Our findings contribute to the broader field of medical NLP for low-resource languages such as Romanian.
