Table of Contents
Fetching ...

TrustFed: Enabling Trustworthy Medical AI under Data Privacy Constraints

Vagish Kumar, Syed Bahauddin Alam, Souvik Chakraborty

Abstract

Protecting patient privacy remains a fundamental barrier to scaling machine learning across healthcare institutions, where centralizing sensitive data is often infeasible due to ethical, legal, and regulatory constraints. Federated learning offers a promising alternative by enabling privacy-preserving, multi-institutional training without sharing raw patient data; however, real-world deployments face severe challenges from data heterogeneity, site-specific biases, and class imbalance, which degrade predictive reliability and render existing uncertainty quantification methods ineffective. Here, we present TrustFed, a federated uncertainty quantification framework that provides distribution-free, finite-sample coverage guarantees under heterogeneous and imbalanced healthcare data, without requiring centralized access. TrustFed introduces a representation-aware client assignment mechanism that leverages internal model representations to enable effective calibration across institutions, along with a soft-nearest threshold aggregation strategy that mitigates assignment uncertainty while producing compact and reliable prediction sets. Using over 430,000 medical images across six clinically distinct imaging modalities, we conduct one of the most comprehensive evaluations of uncertainty-aware federated learning in medical imaging, demonstrating robust coverage guarantees across datasets with diverse class cardinalities and imbalance regimes. By validating TrustFed at this scale and breadth, our study advances uncertainty-aware federated learning from proof-of-concept toward clinically meaningful, modality-agnostic deployment, positioning statistically guaranteed uncertainty as a core requirement for next-generation healthcare AI systems.

TrustFed: Enabling Trustworthy Medical AI under Data Privacy Constraints

Abstract

Protecting patient privacy remains a fundamental barrier to scaling machine learning across healthcare institutions, where centralizing sensitive data is often infeasible due to ethical, legal, and regulatory constraints. Federated learning offers a promising alternative by enabling privacy-preserving, multi-institutional training without sharing raw patient data; however, real-world deployments face severe challenges from data heterogeneity, site-specific biases, and class imbalance, which degrade predictive reliability and render existing uncertainty quantification methods ineffective. Here, we present TrustFed, a federated uncertainty quantification framework that provides distribution-free, finite-sample coverage guarantees under heterogeneous and imbalanced healthcare data, without requiring centralized access. TrustFed introduces a representation-aware client assignment mechanism that leverages internal model representations to enable effective calibration across institutions, along with a soft-nearest threshold aggregation strategy that mitigates assignment uncertainty while producing compact and reliable prediction sets. Using over 430,000 medical images across six clinically distinct imaging modalities, we conduct one of the most comprehensive evaluations of uncertainty-aware federated learning in medical imaging, demonstrating robust coverage guarantees across datasets with diverse class cardinalities and imbalance regimes. By validating TrustFed at this scale and breadth, our study advances uncertainty-aware federated learning from proof-of-concept toward clinically meaningful, modality-agnostic deployment, positioning statistically guaranteed uncertainty as a core requirement for next-generation healthcare AI systems.
Paper Structure (10 sections, 12 equations, 4 figures, 1 algorithm)

This paper contains 10 sections, 12 equations, 4 figures, 1 algorithm.

Figures (4)

  • Figure 1: Overview of sample-adaptive federated conformal prediction framework.a Dataset overview: Six dataset involving blood cell microscopy, abdominal computed tomography, dermatoscopic skin lesion, Retinal fundus imaging, kidney tissue microscopy, and colon histopathology are considered to cover the diverse medical imaging modalities. The bar plots show the class distributions of each dataset, while example images illustrate one representative sample per dataset, highlighting the visual characteristics. A schematic human body is used to symbolically represent each dataset. b Federated training: A global model is initialized and broadcast to all clients. Each client performs local training on its private data and uploads the updated parameters, which are then aggregated to update the global model. c Calibration stage: Each client uses a held-out calibration set to compute nonconformity scores and stores them locally. d Sample-adaptive conformal prediction: For a given test sample, its feature space representation is first obtained via the trained model and passed through a similarity estimator. Using K-nearest neighbor search across client feature banks, the most relevant calibration instances are selected. The threshold empirical quantile is obtained by soft nearest threshold aggregation strategy. This quantile, together with the model’s class probabilities from softmax, is used by the prediction set generator to construct a conformal prediction set.
  • Figure 2: Client similarity assignment and calibration of prediction set under class imbalance.a Client assignment accuracy is computed using Euclidean distance in the feature space. Accuracy improves with larger top-k neighbor sets, ensuring reliable client similarity estimation. b The class distribution across clients and the proportion of samples per client are illustrated for blood cell microscopy. c A test sample (lymphocyte) passed through the trained global model initially yields an uncalibrated set of possible labels. Conformal prediction narrows this to a calibrated prediction set with guaranteed coverage $\geq 1-\alpha$
  • Figure 3: Comparative coverage and cardinality analysis across the six datasets under class imbalance.a The table compares FCP, Local, and the Proposed approach across six datasets discussed before under coverage levels $(1-\alpha)=0.9$ and 0.8. It shows that the proposed method consistently achieves empirical coverage close to the nominal level while maintaining competitive prediction set cardinality across datasets. b This plot shows coverage and cardinality as function of $1-\alpha$. c This bar chart shows coverage and cardinality variation for $\alpha=0.9$ and 0.8 under different nearest neighbor (k) values. Larger k improves coverage at the cost of slightly increased cardinality, indicating a trade-off between coverage and set size.
  • Figure 4: Comparative coverage and cardinality analysis across the datasets under sample imbalance.a The table compares the coverage and cardinality across all six datasets at specified confidence levels $(1-\alpha)=0.9$ and 0.8. b This plot shows that the FedTrust achieves the required coverage for various values of $\alpha$. c This bar chart shows top-k client assignment accuracy across datasets. d This pie chart shows the distribution of samples across clients, while maintaining the same class distribution for each client.