UltraFlwr -- An Efficient Federated Surgical Object Detection Framework
Yang Li, Soumya Snigdha Kundu, Maxence Boels, Toktam Mahmoodi, Sebastien Ourselin, Tom Vercauteren, Prokar Dasgupta, Jonathan Shapey, Alejandro Granados
TL;DR
UltraFlwr tackles the challenge of privacy-preserving, multi-institutional surgical object detection by marrying Ultralytics YOLO with the Flower FL framework and introducing Partial Aggregation of detector components. The approach demonstrates that selectively aggregating backbone and neck yields substantial communication savings with performance close to full aggregation, and that heterogeneity plays a nuanced role: intra-client consistency can mitigate the downsides of distribution shifts, while missing labels severely degrade performance. Across IID and clinically motivated heterogeneous settings, FL can narrow inter-client performance gaps and offer practical guidance for deploying YOLO-based detection in diverse surgical environments. Overall, UltraFlwr provides a scalable, edge-friendly platform for federated surgical vision, balancing privacy, efficiency, and effectiveness.
Abstract
Surgical object detection in laparoscopic videos enables real-time instrument identification for workflow analysis and skills assessment, but training robust models such as You Only Look Once (YOLO) is challenged by limited data, privacy constraints, and inter-institutional variability. Federated learning (FL) enables collaborative training without sharing raw data, yet practical support for modern YOLO pipelines under heterogeneous surgical data remains limited. We present UltraFlwr, an open-source, communication-efficient, and edge-deployable framework that integrates Ultralytics YOLO with the Flower FL platform and supports native Partial Aggregation (PA) of YOLO components (backbone, neck, head). Using two public laparoscopic surgical tool detection datasets, we conduct a systematic empirical study of federated YOLO training under Independent and Identically Distributed (IID) and multiple clinically motivated heterogeneous scenarios, including differences in data curation, video length, and label availability. Results show that standard FL aggregators (e.g., FedAvg) do not consistently match centralized training per client, but reduce inter-client performance variability. Aggregating both backbone and neck components achieves performance comparable to full aggregation with lower communication costs. Also, improving within-client data consistency can benefit FL even when it increases distribution shift across clients. These findings provide practical guidance for deploying federated YOLO-based object detection in heterogeneous surgical environments. UltraFlwr is publicly available at https://github.com/KCL-BMEIS/UltraFlwr.
