Table of Contents
Fetching ...

Multi-lingual Multi-institutional Electronic Health Record based Predictive Model

Kyunghoon Hur, Heeyoung Kwak, Jinsu Jang, Nakhwan Kim, Edward Choi

Abstract

Large-scale EHR prediction across institutions is hindered by substantial heterogeneity in schemas and code systems. Although Common Data Models (CDMs) can standardize records for multi-institutional learning, the manual harmonization and vocabulary mapping are costly and difficult to scale. Text-based harmonization provides an alternative by converting raw EHR into a unified textual form, enabling pooled learning without explicit standardization. However, applying this paradigm to multi-national datasets introduces an additional layer of heterogeneity, which is "language" that must be addressed for truly scalable EHRs learning. In this work, we investigate multilingual multi-institutional learning for EHR prediction, aiming to enable pooled training across multinational ICU datasets without manual standardization. We compare two practical strategies for handling language barriers: (i) directly modeling multilingual records with multilingual encoders, and (ii) translating non-English records into English via LLM-based word-level translation. Across seven public ICU datasets, ten clinical tasks with multiple prediction windows, translation-based lingual alignment yields more reliable cross-dataset performance than multilingual encoders. The multi-institutional learning model consistently outperforms strong baselines that require manual feature selection and harmonization, and also surpasses single-dataset training. We further demonstrate that text-based framework with lingual alignment effectively performs transfer learning via few-shot fine-tuning, with additional gains. To our knowledge, this is the first study to aggregate multilingual multinational ICU EHR datasets into one predictive model, providing a scalable path toward language-agnostic clinical prediction and future global multi-institutional EHR research.

Multi-lingual Multi-institutional Electronic Health Record based Predictive Model

Abstract

Large-scale EHR prediction across institutions is hindered by substantial heterogeneity in schemas and code systems. Although Common Data Models (CDMs) can standardize records for multi-institutional learning, the manual harmonization and vocabulary mapping are costly and difficult to scale. Text-based harmonization provides an alternative by converting raw EHR into a unified textual form, enabling pooled learning without explicit standardization. However, applying this paradigm to multi-national datasets introduces an additional layer of heterogeneity, which is "language" that must be addressed for truly scalable EHRs learning. In this work, we investigate multilingual multi-institutional learning for EHR prediction, aiming to enable pooled training across multinational ICU datasets without manual standardization. We compare two practical strategies for handling language barriers: (i) directly modeling multilingual records with multilingual encoders, and (ii) translating non-English records into English via LLM-based word-level translation. Across seven public ICU datasets, ten clinical tasks with multiple prediction windows, translation-based lingual alignment yields more reliable cross-dataset performance than multilingual encoders. The multi-institutional learning model consistently outperforms strong baselines that require manual feature selection and harmonization, and also surpasses single-dataset training. We further demonstrate that text-based framework with lingual alignment effectively performs transfer learning via few-shot fine-tuning, with additional gains. To our knowledge, this is the first study to aggregate multilingual multinational ICU EHR datasets into one predictive model, providing a scalable path toward language-agnostic clinical prediction and future global multi-institutional EHR research.

Paper Structure

This paper contains 28 sections, 5 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Overview of the multilingual multi-institutional EHR predictive framework. EHR time-series from multiple hospitals and nations—recorded under heterogeneous code systems, schemas, and languages (e.g., English, Dutch, German)—are harmonized into a unified textual/event representation and jointly used to train a single foundational predictive model. The resulting model is evaluated across diverse ICU prediction tasks, demonstrating scalable pooled learning without manual standardization.
  • Figure 2: Language composition of clinical text across seven ICU EHR datasets. Each bar shows the percentage of tokens identified as English (en), Dutch (nl), German (de), or undetected by our language identification pipeline. “Undetected” denotes tokens that are not present in standard word lexicons and are not confidently assigned to any language by the identifier—typically domain-specific abbreviations or proper nouns (e.g., “PO” for oral administration).
  • Figure 3: Comparison of conventional multi-institutional learning and our multi-lingual text-based workflow. The conventional pipeline (top) requires each hospital to manually select, share, and align a common feature set across heterogeneous EHR schemas and code systems before model training. In contrast, the proposed workflow (bottom) linearizes raw EHR tables from different institutions and languages into a unified textual representation, applies LLM-based translation to standardize non-English content into English, and trains a single pooled model directly on the harmonized text for downstream task prediction with minimal preprocessing.
  • Figure 4: Cross-site transfer learning performance across seven datasets when fine-tuning on only 10% of the target data for ReMED+LLM Align (left) and YAIB (right). Each heatmap shows AUROC when training on the source dataset (rows) and evaluating on the target dataset (columns); diagonal cells represents the single-dataset models (no transfer) , while off-diagonal cells correspond to transfer learning via fine-tuning on the target site.
  • Figure 5: Cross-site transfer learning performance across seven datasets when fully finetuning of the target data for ReMed+LLM Align (left) and YAIB (right). Each heatmap shows AUROC when training on the source dataset (rows) and evaluating on the target dataset (columns); diagonal cells represents the single-dataset models (no transfer) , while off-diagonal cells correspond to transfer learning via fine-tuning on the target site.