Table of Contents
Fetching ...

Measuring Cross-Jurisdictional Transfer of Medical Device Risk Concepts with Explainable AI

Yu Han, Aaron Ceross

Abstract

Medical device regulators in the United States(FDA), China (NMPA), and Europe (EU MDR) all use the language of risk, but classify devices through structurally different mechanisms. Whether these apparently shared concepts carry transferable classificatory signal across jurisdictions remains unclear. We test this by reframing explainable AI as an empirical probe of cross-jurisdictional regulatory overlap. Using 141,942 device records, we derive seven EU MDR risk factors, including implantability, invasiveness, and duration of use, and evaluate their contribution across a three-by-three transfer matrix. Under a symmetric extraction pipeline designed to remove jurisdiction-specific advantages, factor contribution is negligible in all jurisdictions, indicating that clean cross-jurisdictional signal is at most marginal. Under jurisdiction specific pipelines, a modest gain appears only in the EU MDR-to-NMPA direction, but sensitivity analyses show that this effect is weak, context-dependent, and partly confounded by extraction and representation choices. Reverse direction probes show strong asymmetry: FDA-derived factors do not transfer meaningfully in any direction, and NMPA-derived factors do not carry signal back to EU MDR. Zero-shot transfer further fails on EU MDR Class I, consistent with a mismatch between residual and positional class definitions. Overall, cross-jurisdictional transfer is sparse, asymmetric, and weak. Shared regulatory vocabulary does not, under this operationalisation, translate into strong portable classification logic. The findings challenge a common assumption in cross-jurisdictional regulatory AI and show how explainable AI can be used to measure, rather than assume, regulatory overlap.

Measuring Cross-Jurisdictional Transfer of Medical Device Risk Concepts with Explainable AI

Abstract

Medical device regulators in the United States(FDA), China (NMPA), and Europe (EU MDR) all use the language of risk, but classify devices through structurally different mechanisms. Whether these apparently shared concepts carry transferable classificatory signal across jurisdictions remains unclear. We test this by reframing explainable AI as an empirical probe of cross-jurisdictional regulatory overlap. Using 141,942 device records, we derive seven EU MDR risk factors, including implantability, invasiveness, and duration of use, and evaluate their contribution across a three-by-three transfer matrix. Under a symmetric extraction pipeline designed to remove jurisdiction-specific advantages, factor contribution is negligible in all jurisdictions, indicating that clean cross-jurisdictional signal is at most marginal. Under jurisdiction specific pipelines, a modest gain appears only in the EU MDR-to-NMPA direction, but sensitivity analyses show that this effect is weak, context-dependent, and partly confounded by extraction and representation choices. Reverse direction probes show strong asymmetry: FDA-derived factors do not transfer meaningfully in any direction, and NMPA-derived factors do not carry signal back to EU MDR. Zero-shot transfer further fails on EU MDR Class I, consistent with a mismatch between residual and positional class definitions. Overall, cross-jurisdictional transfer is sparse, asymmetric, and weak. Shared regulatory vocabulary does not, under this operationalisation, translate into strong portable classification logic. The findings challenge a common assumption in cross-jurisdictional regulatory AI and show how explainable AI can be used to measure, rather than assume, regulatory overlap.

Paper Structure

This paper contains 39 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Study overview. The EU MDR classifies medical devices through 22 explicit rules in Annex VIII, from which we extract seven regulatory risk factors (implantable, duration of use, invasiveness, active device, sterile, measuring function, and reusable). These factors serve as an empirical probe, testing whether risk concepts that carry classification signal in one regulatory system transfer meaningfully to structurally different ones such as NMPA's catalogue-based approach and FDA's predicate-based approach. The probe beam (centre) illustrates the observed gradient, with factor contribution highest within the originating rule-based system (a ceiling case reflecting circularity), dropping to modest levels in the catalogue-based system and falling to negligible in the predicate-based system.
  • Figure 2: Factor contribution ($\Delta$F1) under two estimands (EU MDR hatched grey = source-jurisdiction ceiling, not commensurable with cross-jurisdictional results). (A) Symmetric pipeline (primary estimand): all jurisdictions $\Delta$F1 $<$ 0.01, confirming the clean cross-jurisdictional signal is at most marginal. (B) Jurisdiction-specific pipeline (upper bound): NMPA $+$0.024 [95% CI $+$0.023, $+$0.024]; FDA $+$0.006 [95% CI $+$0.005, $+$0.007]; EU MDR ceiling $+$0.192 (circular, excluded from cross-jurisdictional inference). Error bars = 95% CI from 10 seed-level means.
  • Figure 3: EU MDR zero-shot transfer failure. (A) Classwise F1 under the unified model (EU MDR in-domain) vs. zero-shot transfer (train FDA+NMPA, test EU MDR) under the multilingual sensitivity protocol (Section \ref{['sec:multilingual']}): Class I collapses to near zero under both encoders (TF-IDF: 0.029; MiniLM: 0.009); under the main analysis protocol, the collapse is even more severe (Class I F1 = 0.001, Table \ref{['tab:master_results']}). (B) Mechanism tests: Test A (matched class priors, undersampling Class I to 2.5%) leaves Class I F1 at 0.000, ruling out prior mismatch; Test B (excluding Class I) yields macro-F1 = 0.745, confirming Class I collapse is the dominant degradation source.
  • Figure 4: Two-by-two adjudication of NMPA factor contribution ($\Delta$F1), crossing text encoder (TF-IDF vs. multilingual MiniLM) with device-type control (39 one-hot NMPA category codes). NMPA factor gain is positive in all four conditions, but smaller once device-type control is added. Under TF-IDF, $\Delta$F1 falls from +0.024 to +0.004 with control; under MiniLM, it falls from +0.023 to +0.016. FDA values remain negligible in all conditions ($\leq +0.003$). The figure therefore indicates that the NMPA signal is reduced, but not eliminated, by device-type control, and that this residual is larger under the stronger multilingual encoder.