Improving the Classification Effect of Clinical Images of Diseases for Multi-Source Privacy Protection

Tian Bowen; Xu Zhengyang; Yin Zhihao; Wang Jingying; Yue Yutao

Improving the Classification Effect of Clinical Images of Diseases for Multi-Source Privacy Protection

Tian Bowen, Xu Zhengyang, Yin Zhihao, Wang Jingying, Yue Yutao

TL;DR

Privacy constraints hinder data sharing across hospitals for medical image analysis. The authors propose a data-vector framework that fine-tunes a common pre-trained network on private data, computes per-site data vectors $\tau_j = \theta_j - \theta_{pre}$, and linearly combines them to form synthetic weights $\theta_{mix} = \theta_{pre} + \tau_{sum}$ without exchanging private data. A fusion model is produced by applying the synthesized vector and a small restoration step to align batch-norm statistics. Experiments on PAD-UFES-20, Retina, and Endoscopic Bladder Tissue show that the data-vector method significantly outperforms single-site fine-tuning and rivals full-data training, while random vectors underperform, illustrating the effectiveness of data-vector driven fusion for privacy-preserving medical AI. The work provides a practical privacy-preserving approach to leverage dispersed medical data and offers theoretical intuition for why parameter mixing improves generalization, with guidance for future exploration.

Abstract

Privacy data protection in the medical field poses challenges to data sharing, limiting the ability to integrate data across hospitals for training high-precision auxiliary diagnostic models. Traditional centralized training methods are difficult to apply due to violations of privacy protection principles. Federated learning, as a distributed machine learning framework, helps address this issue, but it requires multiple hospitals to participate in training simultaneously, which is hard to achieve in practice. To address these challenges, we propose a medical privacy data training framework based on data vectors. This framework allows each hospital to fine-tune pre-trained models on private data, calculate data vectors (representing the optimization direction of model parameters in the solution space), and sum them up to generate synthetic weights that integrate model information from multiple hospitals. This approach enhances model performance without exchanging private data or requiring synchronous training. Experimental results demonstrate that this method effectively utilizes dispersed private data resources while protecting patient privacy. The auxiliary diagnostic model trained using this approach significantly outperforms models trained independently by a single hospital, providing a new perspective for resolving the conflict between medical data privacy protection and model training and advancing the development of medical intelligence.

Improving the Classification Effect of Clinical Images of Diseases for Multi-Source Privacy Protection

TL;DR

, and linearly combines them to form synthetic weights

without exchanging private data. A fusion model is produced by applying the synthesized vector and a small restoration step to align batch-norm statistics. Experiments on PAD-UFES-20, Retina, and Endoscopic Bladder Tissue show that the data-vector method significantly outperforms single-site fine-tuning and rivals full-data training, while random vectors underperform, illustrating the effectiveness of data-vector driven fusion for privacy-preserving medical AI. The work provides a practical privacy-preserving approach to leverage dispersed medical data and offers theoretical intuition for why parameter mixing improves generalization, with guidance for future exploration.

Abstract

Paper Structure (15 sections, 6 equations, 2 figures, 4 tables)

This paper contains 15 sections, 6 equations, 2 figures, 4 tables.

Introduction
Problem Setting
Method
Fine-tuning of pretrained models
Calculate data vectors
Synthetic total vector
Generate a fusion model
Restore the model state
Experiments
Implementation details
Interpretation of results
Appendix
Endoscopic Bladder Tissue
PAD-UFES-20
Retina

Figures (2)

Figure 1: This is the overview of our proposed methodology, and each stage is described in detail in \ref{['method']}.
Figure 2: An explanation of the direction in which the data vector is moving

Improving the Classification Effect of Clinical Images of Diseases for Multi-Source Privacy Protection

TL;DR

Abstract

Improving the Classification Effect of Clinical Images of Diseases for Multi-Source Privacy Protection

Authors

TL;DR

Abstract

Table of Contents

Figures (2)