Table of Contents
Fetching ...

A Bayesian Approach to Weakly-supervised Laparoscopic Image Segmentation

Zhou Zheng, Yuichiro Hayashi, Masahiro Oda, Takayuki Kitasaka, Kensaku Mori

TL;DR

This work tackles weakly‑supervised laparoscopic image segmentation under sparse annotations by introducing a fully Bayesian framework that models the joint distribution $p(oldsymbol{x},oldsymbol{y})$ via latent variables, enabling sampling of high‑quality pseudo‑labels and explicit uncertainty estimation. The methodology combines a conditional variational auto‑encoder with a DenseCRF‑based CRF term to maximize an ELBO objective, while using two encoder–decoder streams to reconstruct images and generate segmentation maps from latent codes and image features. Empirical results on CholecSeg8k and AutoLaparo show state‑of‑the‑art performance among scribble/weakly‑supervised methods, with further demonstration on scribble‑supervised cardiac segmentation (ACDC) indicating cross‑domain generalizability. The approach provides uncertainty quantification via MC dropout and improves robustness to sparse supervision, at the cost of higher computational demand. The work contributes a principled Bayesian formulation, extensive validation, and public code for broader adoption in label‑efficient medical image segmentation.

Abstract

In this paper, we study weakly-supervised laparoscopic image segmentation with sparse annotations. We introduce a novel Bayesian deep learning approach designed to enhance both the accuracy and interpretability of the model's segmentation, founded upon a comprehensive Bayesian framework, ensuring a robust and theoretically validated method. Our approach diverges from conventional methods that directly train using observed images and their corresponding weak annotations. Instead, we estimate the joint distribution of both images and labels given the acquired data. This facilitates the sampling of images and their high-quality pseudo-labels, enabling the training of a generalizable segmentation model. Each component of our model is expressed through probabilistic formulations, providing a coherent and interpretable structure. This probabilistic nature benefits accurate and practical learning from sparse annotations and equips our model with the ability to quantify uncertainty. Extensive evaluations with two public laparoscopic datasets demonstrated the efficacy of our method, which consistently outperformed existing methods. Furthermore, our method was adapted for scribble-supervised cardiac multi-structure segmentation, presenting competitive performance compared to previous methods. The code is available at https://github.com/MoriLabNU/Bayesian_WSS.

A Bayesian Approach to Weakly-supervised Laparoscopic Image Segmentation

TL;DR

This work tackles weakly‑supervised laparoscopic image segmentation under sparse annotations by introducing a fully Bayesian framework that models the joint distribution via latent variables, enabling sampling of high‑quality pseudo‑labels and explicit uncertainty estimation. The methodology combines a conditional variational auto‑encoder with a DenseCRF‑based CRF term to maximize an ELBO objective, while using two encoder–decoder streams to reconstruct images and generate segmentation maps from latent codes and image features. Empirical results on CholecSeg8k and AutoLaparo show state‑of‑the‑art performance among scribble/weakly‑supervised methods, with further demonstration on scribble‑supervised cardiac segmentation (ACDC) indicating cross‑domain generalizability. The approach provides uncertainty quantification via MC dropout and improves robustness to sparse supervision, at the cost of higher computational demand. The work contributes a principled Bayesian formulation, extensive validation, and public code for broader adoption in label‑efficient medical image segmentation.

Abstract

In this paper, we study weakly-supervised laparoscopic image segmentation with sparse annotations. We introduce a novel Bayesian deep learning approach designed to enhance both the accuracy and interpretability of the model's segmentation, founded upon a comprehensive Bayesian framework, ensuring a robust and theoretically validated method. Our approach diverges from conventional methods that directly train using observed images and their corresponding weak annotations. Instead, we estimate the joint distribution of both images and labels given the acquired data. This facilitates the sampling of images and their high-quality pseudo-labels, enabling the training of a generalizable segmentation model. Each component of our model is expressed through probabilistic formulations, providing a coherent and interpretable structure. This probabilistic nature benefits accurate and practical learning from sparse annotations and equips our model with the ability to quantify uncertainty. Extensive evaluations with two public laparoscopic datasets demonstrated the efficacy of our method, which consistently outperformed existing methods. Furthermore, our method was adapted for scribble-supervised cardiac multi-structure segmentation, presenting competitive performance compared to previous methods. The code is available at https://github.com/MoriLabNU/Bayesian_WSS.

Paper Structure

This paper contains 10 sections, 10 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Flowchart of the proposed framework. At the learning stage, we first learn $p(\mathbf{x},\mathbf{y}|\mathbf{z})$ by modeling $p(\mathbf{x}|\mathbf{z})$ for image reconstruction and $p(\mathbf{y}|\mathbf{x},\mathbf{z})$ for label generation. After obtaining $p(\mathbf{x},\mathbf{y}|\mathbf{z})$, we sample pairs of $\mathbf{x}$ and $\mathbf{y}$ from $p(\mathbf{x},\mathbf{y}|\mathbf{z})$ to learn a segmentation model, i.e., $p(\mathbf{w}|\mathbf{x},\mathbf{y})$. At the inference stage, we obtain the prediction and corresponding epistemic uncertainty estimation with MC dropout.
  • Figure 2: Ablation studies on efficacy of loss components, influence of sample time $N$, and impact of inference time $T$ with the CholecSeg8k dataset.
  • Figure 2: Network configuration for modeling $p(\mathbf{x},\mathbf{y}|\mathbf{z})$. For simplicity, specifics of the encoder and decoder layers are excluded, and skip connections are omitted.
  • Figure 3: An example of weak annotation simulation with skeletonization. The white area indicates unlabeled region.
  • Figure 4: An example slice of the ACDC dataset.
  • ...and 2 more figures

Theorems & Definitions (1)

  • proof