Table of Contents
Fetching ...

Transforming Medical Regulations into Numbers: Vectorizing a Decade of Medical Device Regulatory Shifts in the USA, EU, and China

Yu Han, Jeroen Bergmann

TL;DR

The paper tackles the complexity of global medical device regulation by converting regulatory texts from the USA, EU, and China into numerical embeddings using NLP. It combines BERT-based named entity recognition, Latent Dirichlet Allocation for topic modeling, and cosine similarity to quantify cross-region alignment across four regulatory phases over a decade. Key contributions include a scalable framework for transforming regulatory documents into embeddings, a regional and temporal analysis of regulatory focus, and a literature-backed demonstration of how harmonization trends emerge, particularly between the US and EU in testing and between CN and the US in animal studies. The findings offer actionable insights for policymakers and industry, illustrating how AI-driven regulatory analytics can accelerate harmonization and improve access to innovative medical devices worldwide.

Abstract

Navigating the regulatory frameworks that ensure the safety and efficacy of medical devices can be challenging, especially across different regions. These frameworks often require redundant testing, slowing down the process of getting innovations to patients. This study leverages Natural Language Processing (NLP) to analyze 664 regulations and guidelines from the USA, EU, and China over the past decade, covering over 200 million tokens. We categorize regulations into key phases, such as animal studies, clinical trials, and other testing stages, and use Bidirectional Encoder Representations from Transformers (BERT) to perform Named Entity Recognition (NER), identifying key regulatory terms and entities. By converting these texts into numerical representations and segmenting them by phase, country, and year, we compare jurisdictional requirements and assess their alignment. Additionally, we apply Latent Dirichlet Allocation (LDA) for theme analysis to observe changes in regulatory focus over time, reflecting evolving priorities and challenges. Our analysis reveals notable semantic similarities and differences between countries and phases. For instance, the closest alignment in animal study regulations is between China and the USA, with a mean cosine distance of 0.33. These findings highlight the computational potential in regulatory science, offering valuable insights for researchers, policymakers, and industry professionals.

Transforming Medical Regulations into Numbers: Vectorizing a Decade of Medical Device Regulatory Shifts in the USA, EU, and China

TL;DR

The paper tackles the complexity of global medical device regulation by converting regulatory texts from the USA, EU, and China into numerical embeddings using NLP. It combines BERT-based named entity recognition, Latent Dirichlet Allocation for topic modeling, and cosine similarity to quantify cross-region alignment across four regulatory phases over a decade. Key contributions include a scalable framework for transforming regulatory documents into embeddings, a regional and temporal analysis of regulatory focus, and a literature-backed demonstration of how harmonization trends emerge, particularly between the US and EU in testing and between CN and the US in animal studies. The findings offer actionable insights for policymakers and industry, illustrating how AI-driven regulatory analytics can accelerate harmonization and improve access to innovative medical devices worldwide.

Abstract

Navigating the regulatory frameworks that ensure the safety and efficacy of medical devices can be challenging, especially across different regions. These frameworks often require redundant testing, slowing down the process of getting innovations to patients. This study leverages Natural Language Processing (NLP) to analyze 664 regulations and guidelines from the USA, EU, and China over the past decade, covering over 200 million tokens. We categorize regulations into key phases, such as animal studies, clinical trials, and other testing stages, and use Bidirectional Encoder Representations from Transformers (BERT) to perform Named Entity Recognition (NER), identifying key regulatory terms and entities. By converting these texts into numerical representations and segmenting them by phase, country, and year, we compare jurisdictional requirements and assess their alignment. Additionally, we apply Latent Dirichlet Allocation (LDA) for theme analysis to observe changes in regulatory focus over time, reflecting evolving priorities and challenges. Our analysis reveals notable semantic similarities and differences between countries and phases. For instance, the closest alignment in animal study regulations is between China and the USA, with a mean cosine distance of 0.33. These findings highlight the computational potential in regulatory science, offering valuable insights for researchers, policymakers, and industry professionals.

Paper Structure

This paper contains 26 sections, 15 figures, 5 tables.

Figures (15)

  • Figure 1: The Workflow of Text Cleaning and Preprocessing
  • Figure 2: Segmentation of Regulations: Labelling, Chunking, Embedding, and Distance Calculation
  • Figure 3: Quantitative corpora amount of different chunks
  • Figure 4: Quantitative analysis of regulation amount in past decade.
  • Figure 5: Quantitative analysis of regulation amount in past decade.
  • ...and 10 more figures