Prompt-based Personalized Federated Learning for Medical Visual Question Answering

He Zhu; Ren Togo; Takahiro Ogawa; Miki Haseyama

Prompt-based Personalized Federated Learning for Medical Visual Question Answering

He Zhu, Ren Togo, Takahiro Ogawa, Miki Haseyama

TL;DR

This work tackles data heterogeneity and privacy in medical VQA by introducing a prompt-based personalized federated learning (pFL) framework. Each client maintains private medical data and communicates through small learnable prompts rather than full model weights, guided by a reliability-weighted aggregation of shared prompts, with a prompt-based residual attention (pRA) transformer enabling efficient cross-client cooperation. The method optimizes a combined loss $\\mathcal{L}_{client} = \,\\mathcal{L}_{CE} + \alpha \\mathcal{L}_d + \beta \\mathcal{R}$ and uses $p_t^s = \sum_{i \\neq t} \\eta_i p_i^p$ where $\\eta_i = \frac{acc_i \,\\cdot \\mathrm{cs}(\\bm{p}_i^p, \\\bm{p}_t^p)}{\\sum_{j \\neq t} acc_j \,\\cdot \\mathrm{cs}(\\bm{p}_j^p, \\\bm{p}_t^p)}$, with $\\mathcal{L}_d = 1 - \frac{\\bm{p}_t^p \\cdot \\m{p}_t^s}{\\|\\bm{p}_t^p\\|_2 \\cdot \\|\\bm{p}_t^s\\|_2}$. Experiments on Slake and VQA-RAD show improved accuracy and substantially reduced parameter exchange (about 0.05% of prior pFL), with text prompts proving particularly beneficial for some datasets. The findings highlight the practicality of prompt-based pFL for privacy-preserving, communication-efficient medical VQA and its potential extension to other transformer-based VQA baselines.

Abstract

We present a novel prompt-based personalized federated learning (pFL) method to address data heterogeneity and privacy concerns in traditional medical visual question answering (VQA) methods. Specifically, we regard medical datasets from different organs as clients and use pFL to train personalized transformer-based VQA models for each client. To address the high computational complexity of client-to-client communication in previous pFL methods, we propose a succinct information sharing system by introducing prompts that are small learnable parameters. In addition, the proposed method introduces a reliability parameter to prevent the negative effects of low performance and irrelevant clients. Finally, extensive evaluations on various heterogeneous medical datasets attest to the effectiveness of our proposed method.

Prompt-based Personalized Federated Learning for Medical Visual Question Answering

TL;DR

and uses

where

, with

. Experiments on Slake and VQA-RAD show improved accuracy and substantially reduced parameter exchange (about 0.05% of prior pFL), with text prompts proving particularly beneficial for some datasets. The findings highlight the practicality of prompt-based pFL for privacy-preserving, communication-efficient medical VQA and its potential extension to other transformer-based VQA baselines.

Abstract

Paper Structure (9 sections, 11 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 9 sections, 11 equations, 5 figures, 2 tables, 1 algorithm.

Introduction
PROMPT-BASED PERSONALIZED FEDERATED LEARNING FOR MEDICAL VQA
Problem Formulation
Prompt-based Visual Question Answering Model
Client-to-Client Communication
EXPERIMENTS
Settings
Results
CONCLUSION

Figures (5)

Figure 1: An overview of the proposed prompt-based pFL method. A client $c_t$ has private data $\mathcal{D}_t$ following the distribution $\mathcal{P}_t$, and a personalized client VQA model with the weight $\bm{w}_t$. The client uploads the local information and obtains shared information through client-to-client communication.
Figure 2: The proposed transformer encoder that introduces the prompt-based self-attention block. Local prompt $\bm{p}_t^p$ and shared prompt $\bm{p}_t^s$ are integrated into the process of self-attention, and the distance between them is controlled by the distance loss $\mathcal{L}_{d}$.
Figure 3: The proposed client-to-client communication process. Each client communicates to update $\bm{p}_t^p$ and get $\bm{p}_t^s$ from the other clients. The reliability parameter $\eta$ controls the weights of different clients in the communication process, which is computed from the accuracy $acc$ and the similarity function $\mathrm{cs}(\cdot)$.
Figure 4: The tSNE visualization of the prompt embedding results on the Slake and VQA-RAD datasets. The "Local, Shared" represents the local and shared prompts in client 3 that is trained in the sub-dataset of the Abdomen. The "Ograns" represents the local prompt of each client.
Figure 5: Samples of the generated answers on Slake. The results show the four clients we set up, with the corresponding sub-datasets in parentheses.

Prompt-based Personalized Federated Learning for Medical Visual Question Answering

TL;DR

Abstract

Prompt-based Personalized Federated Learning for Medical Visual Question Answering

Authors

TL;DR

Abstract

Table of Contents

Figures (5)