Table of Contents
Fetching ...

Personalizing Federated Instrument Segmentation with Visual Trait Priors in Robotic Surgery

Jialang Xu, Jiacheng Wang, Lequan Yu, Danail Stoyanov, Yueming Jin, Evangelos B. Mazomenos

TL;DR

This work introduces PFedSIS, a personalized federated learning framework for surgical instrument segmentation that encodes visual trait priors through three components: Global-Personalized Disentanglement (GPD) for head-wise self-attention personalization, Appearance-regulation Personalized Enhancement (APE) to align local appearance via hypernetwork-guided updates, and Shape-similarity Global Enhancement (SGE) to preserve cross-site shape information. By decoupling global and personalized parameters and incorporating style-memory-based cross-style augmentation, PFedSIS achieves statistically significant improvements in Dice and IoU while reducing segmentation boundary errors across three diverse surgical datasets. The approach maintains real-time inference and demonstrates robustness to appearance heterogeneity and inter-site instrument-shape similarity, offering a privacy-preserving path toward site-tailored SIS models with practical clinical impact.

Abstract

Personalized federated learning (PFL) for surgical instrument segmentation (SIS) is a promising approach. It enables multiple clinical sites to collaboratively train a series of models in privacy, with each model tailored to the individual distribution of each site. Existing PFL methods rarely consider the personalization of multi-headed self-attention, and do not account for appearance diversity and instrument shape similarity, both inherent in surgical scenes. We thus propose PFedSIS, a novel PFL method with visual trait priors for SIS, incorporating global-personalized disentanglement (GPD), appearance-regulation personalized enhancement (APE), and shape-similarity global enhancement (SGE), to boost SIS performance in each site. GPD represents the first attempt at head-wise assignment for multi-headed self-attention personalization. To preserve the unique appearance representation of each site and gradually leverage the inter-site difference, APE introduces appearance regulation and provides customized layer-wise aggregation solutions via hypernetworks for each site's personalized parameters. The mutual shape information of instruments is maintained and shared via SGE, which enhances the cross-style shape consistency on the image level and computes the shape-similarity contribution of each site on the prediction level for updating the global parameters. PFedSIS outperforms state-of-the-art methods with +1.51% Dice, +2.11% IoU, -2.79 ASSD, -15.55 HD95 performance gains. The corresponding code and models will be released at https://github.com/wzjialang/PFedSIS.

Personalizing Federated Instrument Segmentation with Visual Trait Priors in Robotic Surgery

TL;DR

This work introduces PFedSIS, a personalized federated learning framework for surgical instrument segmentation that encodes visual trait priors through three components: Global-Personalized Disentanglement (GPD) for head-wise self-attention personalization, Appearance-regulation Personalized Enhancement (APE) to align local appearance via hypernetwork-guided updates, and Shape-similarity Global Enhancement (SGE) to preserve cross-site shape information. By decoupling global and personalized parameters and incorporating style-memory-based cross-style augmentation, PFedSIS achieves statistically significant improvements in Dice and IoU while reducing segmentation boundary errors across three diverse surgical datasets. The approach maintains real-time inference and demonstrates robustness to appearance heterogeneity and inter-site instrument-shape similarity, offering a privacy-preserving path toward site-tailored SIS models with practical clinical impact.

Abstract

Personalized federated learning (PFL) for surgical instrument segmentation (SIS) is a promising approach. It enables multiple clinical sites to collaboratively train a series of models in privacy, with each model tailored to the individual distribution of each site. Existing PFL methods rarely consider the personalization of multi-headed self-attention, and do not account for appearance diversity and instrument shape similarity, both inherent in surgical scenes. We thus propose PFedSIS, a novel PFL method with visual trait priors for SIS, incorporating global-personalized disentanglement (GPD), appearance-regulation personalized enhancement (APE), and shape-similarity global enhancement (SGE), to boost SIS performance in each site. GPD represents the first attempt at head-wise assignment for multi-headed self-attention personalization. To preserve the unique appearance representation of each site and gradually leverage the inter-site difference, APE introduces appearance regulation and provides customized layer-wise aggregation solutions via hypernetworks for each site's personalized parameters. The mutual shape information of instruments is maintained and shared via SGE, which enhances the cross-style shape consistency on the image level and computes the shape-similarity contribution of each site on the prediction level for updating the global parameters. PFedSIS outperforms state-of-the-art methods with +1.51% Dice, +2.11% IoU, -2.79 ASSD, -15.55 HD95 performance gains. The corresponding code and models will be released at https://github.com/wzjialang/PFedSIS.
Paper Structure (30 sections, 14 equations, 3 figures, 4 tables, 1 algorithm)

This paper contains 30 sections, 14 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: Our proposed PFedSIS. (a) The overview of the PFedSIS architecture. The global-personalized disentanglement (GPD), appearance-regulation personalized enhancement (APE), and shape-similarity global enhancement (SGE) modules are highlighted with gainsboro, lilac, and lemon backgrounds, respectively. In APE, considering site $m$'s hypernetwork $HN^m(\nu^m;\varphi^m)$ as an example, $\nu^m$ and $\varphi^m$ are updated based on the change of site $m$'s personalized parameters $\Delta \theta_P^m$. ①--⑦ represent the workflow steps of PFedSIS; (b) Illustrations of the multi-headed self-attention in GPD.
  • Figure 2: Visual comparison of different methods on three datasets. Yellow boxes highlight the regions where significant differences exist between the methods.
  • Figure 3: The heatmap of aggregation matrices $\Omega_{HN}^1, \Omega_{HN}^2, \Omega_{HN}^3$ of the regulation head $H_{ar}$ generated by all three sites' hypernetworks. X-axis and Y-axis show the IDs of sites. Each row represents the aggregation weights of that site's regulation head.