Table of Contents
Fetching ...

Automated Generation of Custom MedDRA Queries Using SafeTerm Medical Map

Francois Vandenhende, Anna Georgiou, Michalis Georgiou, Theodoros Psaras, Ellie Karekla, Elena Hadjicosta

TL;DR

This work presents SafeTerm, an AI-driven system that maps input medical concepts to MedDRA PTs by embedding terms in a unified vector space and applying cosine similarity with clustering to rank candidates. The AMQ pipeline performs best-term matching, similarity scoring, and clustering to produce a high-quality set of PTs for signal-detection queries, validated against FDA OCMQ v3.0. Despite not being fine-tuned on SMQs/OCMQs, the approach achieves high recall at moderate thresholds and a practical F1 around 0.39 at optimal thresholds, supporting its use as a scalable, version-robust first-pass retrieval tool. The study highlights the importance of precise MedDRA terminology and suggests starting with a moderate similarity threshold (≈0.60) and increasing it for refined term selection. Overall, SafeTerm offers a viable, unsupervised method for automated MedDRA query generation that can adapt across dictionary versions and reduce manual maintenance burden.

Abstract

In pre-market drug safety review, grouping related adverse event terms into standardised MedDRA queries or the FDA Office of New Drugs Custom Medical Queries (OCMQs) is critical for signal detection. We present a novel quantitative artificial intelligence system that understands and processes medical terminology and automatically retrieves relevant MedDRA Preferred Terms (PTs) for a given input query, ranking them by a relevance score using multi-criteria statistical methods. The system (SafeTerm) embeds medical query terms and MedDRA PTs in a multidimensional vector space, then applies cosine similarity and extreme-value clustering to generate a ranked list of PTs. Validation was conducted against the FDA OCMQ v3.0 (104 queries), restricted to valid MedDRA PTs. Precision, recall and F1 were computed across similarity-thresholds. High recall (>95%) is achieved at moderate thresholds. Higher thresholds improve precision (up to 86%). The optimal threshold (~0.70 - 0.75) yielded recall ~50% and precision ~33%. Narrow-term PT subsets performed similarly but required slightly higher similarity thresholds. The SafeTerm AI-driven system provides a viable supplementary method for automated MedDRA query generation. A similarity threshold of ~0.60 is recommended initially, with increased thresholds for refined term selection.

Automated Generation of Custom MedDRA Queries Using SafeTerm Medical Map

TL;DR

This work presents SafeTerm, an AI-driven system that maps input medical concepts to MedDRA PTs by embedding terms in a unified vector space and applying cosine similarity with clustering to rank candidates. The AMQ pipeline performs best-term matching, similarity scoring, and clustering to produce a high-quality set of PTs for signal-detection queries, validated against FDA OCMQ v3.0. Despite not being fine-tuned on SMQs/OCMQs, the approach achieves high recall at moderate thresholds and a practical F1 around 0.39 at optimal thresholds, supporting its use as a scalable, version-robust first-pass retrieval tool. The study highlights the importance of precise MedDRA terminology and suggests starting with a moderate similarity threshold (≈0.60) and increasing it for refined term selection. Overall, SafeTerm offers a viable, unsupervised method for automated MedDRA query generation that can adapt across dictionary versions and reduce manual maintenance burden.

Abstract

In pre-market drug safety review, grouping related adverse event terms into standardised MedDRA queries or the FDA Office of New Drugs Custom Medical Queries (OCMQs) is critical for signal detection. We present a novel quantitative artificial intelligence system that understands and processes medical terminology and automatically retrieves relevant MedDRA Preferred Terms (PTs) for a given input query, ranking them by a relevance score using multi-criteria statistical methods. The system (SafeTerm) embeds medical query terms and MedDRA PTs in a multidimensional vector space, then applies cosine similarity and extreme-value clustering to generate a ranked list of PTs. Validation was conducted against the FDA OCMQ v3.0 (104 queries), restricted to valid MedDRA PTs. Precision, recall and F1 were computed across similarity-thresholds. High recall (>95%) is achieved at moderate thresholds. Higher thresholds improve precision (up to 86%). The optimal threshold (~0.70 - 0.75) yielded recall ~50% and precision ~33%. Narrow-term PT subsets performed similarly but required slightly higher similarity thresholds. The SafeTerm AI-driven system provides a viable supplementary method for automated MedDRA query generation. A similarity threshold of ~0.60 is recommended initially, with increased thresholds for refined term selection.

Paper Structure

This paper contains 14 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: SafeTerm Automated Medical Query System.
  • Figure 2: Mean $\pm$ SD performance across similarity cut-offs.
  • Figure 3: Performance for OCMQ Narrow Terms Retrieval.
  • Figure 4: Mean $\pm$ SD: Precision, Recall, F1 versus similarity cut-off for Narrow-Term retrieval (across OCMQs).