Privacy-Preserving Synthetic Continual Semantic Segmentation for Robotic Surgery

Mengya Xu; Mobarakol Islam; Long Bai; Hongliang Ren

Privacy-Preserving Synthetic Continual Semantic Segmentation for Robotic Surgery

Mengya Xu, Mobarakol Islam, Long Bai, Hongliang Ren

TL;DR

This work tackles catastrophic forgetting in continual semantic segmentation for robotic surgery under privacy constraints by introducing a privacy-preserving synthetic framework called CAT-SD. It blends open-source old-instrument foregrounds with synthetic/augmented backgrounds and uses class-aware temperature normalization (CAT) together with multi-scale shifted-feature distillation (SD) to preserve old knowledge while learning new instruments, aided by synthetic pseudo-exemplars generated with StyleGAN-XL and blending/harmonization. The approach outperforms baseline continual learning methods on EndoVis 2017/2018, with ablations confirming the importance of CAT and SD and robustness analyses demonstrating resilience to input perturbations. This framework reduces data collection and privacy risks while enabling continual updates to instrument repertoires in robot-assisted surgery, with potential for incremental domain adaptation in future work.

Abstract

Deep Neural Networks (DNNs) based semantic segmentation of the robotic instruments and tissues can enhance the precision of surgical activities in robot-assisted surgery. However, in biological learning, DNNs cannot learn incremental tasks over time and exhibit catastrophic forgetting, which refers to the sharp decline in performance on previously learned tasks after learning a new one. Specifically, when data scarcity is the issue, the model shows a rapid drop in performance on previously learned instruments after learning new data with new instruments. The problem becomes worse when it limits releasing the dataset of the old instruments for the old model due to privacy concerns and the unavailability of the data for the new or updated version of the instruments for the continual learning model. For this purpose, we develop a privacy-preserving synthetic continual semantic segmentation framework by blending and harmonizing (i) open-source old instruments foreground to the synthesized background without revealing real patient data in public and (ii) new instruments foreground to extensively augmented real background. To boost the balanced logit distillation from the old model to the continual learning model, we design overlapping class-aware temperature normalization (CAT) by controlling model learning utility. We also introduce multi-scale shifted-feature distillation (SD) to maintain long and short-range spatial relationships among the semantic objects where conventional short-range spatial features with limited information reduce the power of feature distillation. We demonstrate the effectiveness of our framework on the EndoVis 2017 and 2018 instrument segmentation dataset with a generalized continual learning setting. Code is available at~\url{https://github.com/XuMengyaAmy/Synthetic_CAT_SD}.

Privacy-Preserving Synthetic Continual Semantic Segmentation for Robotic Surgery

TL;DR

Abstract

Paper Structure (25 sections, 12 equations, 8 figures, 7 tables)

This paper contains 25 sections, 12 equations, 8 figures, 7 tables.

Introduction
Related work
Continual learning methods
Exemplar-based methods
Exemplar-free methods
Image synthesis
Methodology
Preliminary
Continual learning with logit distillation
Continual learning with feature distillation
Multi-scale pooling
Temperature normalization
Privacy-preserving Synthetic Continual Semantic Segmentation
Class-Aware Temperature-normalization (CAT)
Multi-scale Shifted-feature Distillation (SD)
...and 10 more sections

Figures (8)

Figure 1: Instruments classes in our continual learning settings. The old non-overlapping instruments in EndoVis 2017 are Vessel Sealer and Grasping Retractor, and the new non-overlapping instruments in EndoVis 2018 are Suction and Clip Applier. Other regular overlapping instruments appear in both EndoVis 2017 and EndoVis 2018.
Figure 2: Overview of our proposed privacy-preserving CAT-SD continual learning approach. The old model is the model weights from a hospital without sharing the training data (in our case, EndoVis 2017 dataset allan20192017), and it can recognize the $n$ classes. Our pseudo-rehearsal-based CAT-SD approach aims to learn a continual learning model which can deal with the m new classes from EndoVis 2018 dataset allan20202018 and catastrophic forgetting. CAT-SD forms of modules of (i) Blending and Harmonization: to synthesize the surgical background images to blend with old non-overlapping instruments and publicly available real surgical background to blend with overlapping and new non-overlapping instruments to ensure privacy-preserving continual learning; (ii) Multi-scale Shifted-feature Distillation (SD): to enhance the feature distillation and maintain long and short-range spatial relationships among the semantic objects; (iii) Class-Aware Temperature-normalization (CAT): to tackle the imbalance learning between old and new classes based on logits distillation.
Figure 3: Multi-scale Shifted-feature Distillation (SD). Two intermediate feature embeddings are obtained from the old model at task $t-1$ and the continual learning model at task $t$. The first two regular scales, $s=2$ and $s=4$ are equivalent to Local POD douillard2021plop. We first divide the feature embedding into $2^s$ sub-region feature embeddings equally at scale $s$. Thus $2^2$ and $2^4$ sub-regions are created separately when scales $s=2$ and $s=4$. On the basis that scales $s=4$, we group adjacent sub-regions in the interior to form irregular and unequal sub-regions, which are named the shifted embedding tensor. We then compute the width and height pooling slices for each sub-region. Eventually, all these width and height pooling slices are concatenated together. The feature distillation between the old and continual learning models is performed based on concatenated features.
Figure 4: Multi-class image synthesis process in our work consists of $4$ tasks: 1) selecting source images; 2) generating the augmented background and foreground images; 3) blending background and foreground images randomly and 4) harmonizing the blended images. One background tissue image and $2$ foreground instrument images are stored as source images for each instrument. After augmentation, the background image has $50$ variations, and the foreground image has $100$ variations. Background and foreground variations are blended randomly by limiting up to 3 tools to appear simultaneously. Eventually, the blended images are harmonized to obtain more realistic images.
Figure 5: Privacy-preserving pseudo-exemplar. When presented with the cropped background instances from the real dataset, the discriminator should recognize genuine ones. Meanwhile, the generator generates synthetic background images that it sends to the discriminator. In pseudo rehearsal, the foreground instruments are blended with the background tissue generated by the GAN model.
...and 3 more figures

Privacy-Preserving Synthetic Continual Semantic Segmentation for Robotic Surgery

TL;DR

Abstract

Privacy-Preserving Synthetic Continual Semantic Segmentation for Robotic Surgery

Authors

TL;DR

Abstract

Table of Contents

Figures (8)