Table of Contents
Fetching ...

UltraFlwr -- An Efficient Federated Surgical Object Detection Framework

Yang Li, Soumya Snigdha Kundu, Maxence Boels, Toktam Mahmoodi, Sebastien Ourselin, Tom Vercauteren, Prokar Dasgupta, Jonathan Shapey, Alejandro Granados

TL;DR

UltraFlwr tackles the challenge of privacy-preserving, multi-institutional surgical object detection by marrying Ultralytics YOLO with the Flower FL framework and introducing Partial Aggregation of detector components. The approach demonstrates that selectively aggregating backbone and neck yields substantial communication savings with performance close to full aggregation, and that heterogeneity plays a nuanced role: intra-client consistency can mitigate the downsides of distribution shifts, while missing labels severely degrade performance. Across IID and clinically motivated heterogeneous settings, FL can narrow inter-client performance gaps and offer practical guidance for deploying YOLO-based detection in diverse surgical environments. Overall, UltraFlwr provides a scalable, edge-friendly platform for federated surgical vision, balancing privacy, efficiency, and effectiveness.

Abstract

Surgical object detection in laparoscopic videos enables real-time instrument identification for workflow analysis and skills assessment, but training robust models such as You Only Look Once (YOLO) is challenged by limited data, privacy constraints, and inter-institutional variability. Federated learning (FL) enables collaborative training without sharing raw data, yet practical support for modern YOLO pipelines under heterogeneous surgical data remains limited. We present UltraFlwr, an open-source, communication-efficient, and edge-deployable framework that integrates Ultralytics YOLO with the Flower FL platform and supports native Partial Aggregation (PA) of YOLO components (backbone, neck, head). Using two public laparoscopic surgical tool detection datasets, we conduct a systematic empirical study of federated YOLO training under Independent and Identically Distributed (IID) and multiple clinically motivated heterogeneous scenarios, including differences in data curation, video length, and label availability. Results show that standard FL aggregators (e.g., FedAvg) do not consistently match centralized training per client, but reduce inter-client performance variability. Aggregating both backbone and neck components achieves performance comparable to full aggregation with lower communication costs. Also, improving within-client data consistency can benefit FL even when it increases distribution shift across clients. These findings provide practical guidance for deploying federated YOLO-based object detection in heterogeneous surgical environments. UltraFlwr is publicly available at https://github.com/KCL-BMEIS/UltraFlwr.

UltraFlwr -- An Efficient Federated Surgical Object Detection Framework

TL;DR

UltraFlwr tackles the challenge of privacy-preserving, multi-institutional surgical object detection by marrying Ultralytics YOLO with the Flower FL framework and introducing Partial Aggregation of detector components. The approach demonstrates that selectively aggregating backbone and neck yields substantial communication savings with performance close to full aggregation, and that heterogeneity plays a nuanced role: intra-client consistency can mitigate the downsides of distribution shifts, while missing labels severely degrade performance. Across IID and clinically motivated heterogeneous settings, FL can narrow inter-client performance gaps and offer practical guidance for deploying YOLO-based detection in diverse surgical environments. Overall, UltraFlwr provides a scalable, edge-friendly platform for federated surgical vision, balancing privacy, efficiency, and effectiveness.

Abstract

Surgical object detection in laparoscopic videos enables real-time instrument identification for workflow analysis and skills assessment, but training robust models such as You Only Look Once (YOLO) is challenged by limited data, privacy constraints, and inter-institutional variability. Federated learning (FL) enables collaborative training without sharing raw data, yet practical support for modern YOLO pipelines under heterogeneous surgical data remains limited. We present UltraFlwr, an open-source, communication-efficient, and edge-deployable framework that integrates Ultralytics YOLO with the Flower FL platform and supports native Partial Aggregation (PA) of YOLO components (backbone, neck, head). Using two public laparoscopic surgical tool detection datasets, we conduct a systematic empirical study of federated YOLO training under Independent and Identically Distributed (IID) and multiple clinically motivated heterogeneous scenarios, including differences in data curation, video length, and label availability. Results show that standard FL aggregators (e.g., FedAvg) do not consistently match centralized training per client, but reduce inter-client performance variability. Aggregating both backbone and neck components achieves performance comparable to full aggregation with lower communication costs. Also, improving within-client data consistency can benefit FL even when it increases distribution shift across clients. These findings provide practical guidance for deploying federated YOLO-based object detection in heterogeneous surgical environments. UltraFlwr is publicly available at https://github.com/KCL-BMEIS/UltraFlwr.

Paper Structure

This paper contains 20 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Dataset distribution across clients in the $G_{\text{combined}}$ experimental settings, constructed from m2cai16-tool-locations and CholecTrack20. Each client box shows train/validation/test splits. Three settings are considered: (a) IID ($G_{\text{combined-a}}$), where training videos are pooled and randomly redistributed across clients (green); (b) curation-based heterogeneity ($G_{\text{combined-b}}$), where client 0 contains only m2cai16-tool-locations videos (v*), while clients 1 and 2 contain disjoint subsets of CholecTrack20 videos (VID*); and (c) Leave-Multiple-Out (LMO) label heterogeneity ($G_{\text{combined-c}}$), where client 0 retains annotations for all tool classes, while clients 1 and 2 are restricted to energy-based and cold-dissection tools (orange). Training videos are non-overlapping and balanced with best effort across clients by matching video lengths across datasets. The re-curated m2cai16-tool-locations subset is indicated by the red dashed box. VID30 and VID31 are excluded due to corrupted labels.
  • Figure 2: Dataset distribution across clients in the $G_{\text{track20}}$ experimental settings, constructed using CholecTrack20 only. Within each client, cells correspond to train/validation/test splits. Four settings are considered: (a) IID ($G_{\text{track20-a}}$), where training videos are pooled and randomly redistributed across clients (green); (b) length-based heterogeneity ($G_{\text{track20-b}}$), where clients are assigned long, medium, or short videos; (c) Leave-Multiple-Out (LMO) label heterogeneity ($G_{\text{track20-c}}$), where subsets of tool annotations are removed at specific clients (orange); and (d) combined length and LMO heterogeneity ($G_{\text{track20-d}}$). Training videos are non-overlapping across clients. VID30 and VID31 are excluded due to corrupted labels.
  • Figure 3: Training progression of mAP50 under different aggregation strategies in $G_{\text{combined}}$ and $G_{\text{track20}}$ settings. First row is $G_{\text{combined}}$, and each column corresponds to a different setting: (a) IID, (b) heterogeneous by data curation, and (c) heterogeneous by LMO. Second row is $G_{\text{track20}}$, and each column corresponds to a different setting: (a) IID, (b) heterogeneous by video length, (c) heterogeneous by LMO, and (d) heterogeneous by LMO and video length.