Prompt-based Personalized Federated Learning for Medical Visual Question Answering
He Zhu, Ren Togo, Takahiro Ogawa, Miki Haseyama
TL;DR
This work tackles data heterogeneity and privacy in medical VQA by introducing a prompt-based personalized federated learning (pFL) framework. Each client maintains private medical data and communicates through small learnable prompts rather than full model weights, guided by a reliability-weighted aggregation of shared prompts, with a prompt-based residual attention (pRA) transformer enabling efficient cross-client cooperation. The method optimizes a combined loss $\\mathcal{L}_{client} = \,\\mathcal{L}_{CE} + \alpha \\mathcal{L}_d + \beta \\mathcal{R}$ and uses $p_t^s = \sum_{i \\neq t} \\eta_i p_i^p$ where $\\eta_i = \frac{acc_i \,\\cdot \\mathrm{cs}(\\bm{p}_i^p, \\\bm{p}_t^p)}{\\sum_{j \\neq t} acc_j \,\\cdot \\mathrm{cs}(\\bm{p}_j^p, \\\bm{p}_t^p)}$, with $\\mathcal{L}_d = 1 - \frac{\\bm{p}_t^p \\cdot \\m{p}_t^s}{\\|\\bm{p}_t^p\\|_2 \\cdot \\|\\bm{p}_t^s\\|_2}$. Experiments on Slake and VQA-RAD show improved accuracy and substantially reduced parameter exchange (about 0.05% of prior pFL), with text prompts proving particularly beneficial for some datasets. The findings highlight the practicality of prompt-based pFL for privacy-preserving, communication-efficient medical VQA and its potential extension to other transformer-based VQA baselines.
Abstract
We present a novel prompt-based personalized federated learning (pFL) method to address data heterogeneity and privacy concerns in traditional medical visual question answering (VQA) methods. Specifically, we regard medical datasets from different organs as clients and use pFL to train personalized transformer-based VQA models for each client. To address the high computational complexity of client-to-client communication in previous pFL methods, we propose a succinct information sharing system by introducing prompts that are small learnable parameters. In addition, the proposed method introduces a reliability parameter to prevent the negative effects of low performance and irrelevant clients. Finally, extensive evaluations on various heterogeneous medical datasets attest to the effectiveness of our proposed method.
