Table of Contents
Fetching ...

COIN: Counterfactual inpainting for weakly supervised semantic segmentation for medical images

Dmytro Shvetsov, Joonas Ariva, Marharyta Domnich, Raul Vicente, Dmytro Fishman

TL;DR

COIN addresses the scarcity of pixel-wise annotations in medical imaging by reframing weakly supervised semantic segmentation as a counterfactual inpainting problem. A perturbation-based GAN with a single conditioning inpaints abnormal regions to produce a normal counterfactual $X_{cf}$, and the resulting difference map $|X - X_{cf}|$ serves as the weak segmentation label, all without requiring segmentation masks for training. The method optimizes a composite loss combining data fidelity, classifier-consistency, domain self-consistency, and total-variation regularization, and demonstrates strong segmentation performance and classifier-flip reliability on synthetic kidney anomalies and real kidney tumors in CT scans, outperforming attribution methods and a modified Singla et al. baseline. The work offers a practical pathway to accurate medical image segmentation under weak supervision and provides code for broader adoption, with planned extensions to 3D data and other domains to further maximize impact in healthcare where annotations are costly.

Abstract

Deep learning is dramatically transforming the field of medical imaging and radiology, enabling the identification of pathologies in medical images, including computed tomography (CT) and X-ray scans. However, the performance of deep learning models, particularly in segmentation tasks, is often limited by the need for extensive annotated datasets. To address this challenge, the capabilities of weakly supervised semantic segmentation are explored through the lens of Explainable AI and the generation of counterfactual explanations. The scope of this research is development of a novel counterfactual inpainting approach (COIN) that flips the predicted classification label from abnormal to normal by using a generative model. For instance, if the classifier deems an input medical image X as abnormal, indicating the presence of a pathology, the generative model aims to inpaint the abnormal region, thus reversing the classifier's original prediction label. The approach enables us to produce precise segmentations for pathologies without depending on pre-existing segmentation masks. Crucially, image-level labels are utilized, which are substantially easier to acquire than creating detailed segmentation masks. The effectiveness of the method is demonstrated by segmenting synthetic targets and actual kidney tumors from CT images acquired from Tartu University Hospital in Estonia. The findings indicate that COIN greatly surpasses established attribution methods, such as RISE, ScoreCAM, and LayerCAM, as well as an alternative counterfactual explanation method introduced by Singla et al. This evidence suggests that COIN is a promising approach for semantic segmentation of tumors in CT images, and presents a step forward in making deep learning applications more accessible and effective in healthcare, where annotated data is scarce.

COIN: Counterfactual inpainting for weakly supervised semantic segmentation for medical images

TL;DR

COIN addresses the scarcity of pixel-wise annotations in medical imaging by reframing weakly supervised semantic segmentation as a counterfactual inpainting problem. A perturbation-based GAN with a single conditioning inpaints abnormal regions to produce a normal counterfactual , and the resulting difference map serves as the weak segmentation label, all without requiring segmentation masks for training. The method optimizes a composite loss combining data fidelity, classifier-consistency, domain self-consistency, and total-variation regularization, and demonstrates strong segmentation performance and classifier-flip reliability on synthetic kidney anomalies and real kidney tumors in CT scans, outperforming attribution methods and a modified Singla et al. baseline. The work offers a practical pathway to accurate medical image segmentation under weak supervision and provides code for broader adoption, with planned extensions to 3D data and other domains to further maximize impact in healthcare where annotations are costly.

Abstract

Deep learning is dramatically transforming the field of medical imaging and radiology, enabling the identification of pathologies in medical images, including computed tomography (CT) and X-ray scans. However, the performance of deep learning models, particularly in segmentation tasks, is often limited by the need for extensive annotated datasets. To address this challenge, the capabilities of weakly supervised semantic segmentation are explored through the lens of Explainable AI and the generation of counterfactual explanations. The scope of this research is development of a novel counterfactual inpainting approach (COIN) that flips the predicted classification label from abnormal to normal by using a generative model. For instance, if the classifier deems an input medical image X as abnormal, indicating the presence of a pathology, the generative model aims to inpaint the abnormal region, thus reversing the classifier's original prediction label. The approach enables us to produce precise segmentations for pathologies without depending on pre-existing segmentation masks. Crucially, image-level labels are utilized, which are substantially easier to acquire than creating detailed segmentation masks. The effectiveness of the method is demonstrated by segmenting synthetic targets and actual kidney tumors from CT images acquired from Tartu University Hospital in Estonia. The findings indicate that COIN greatly surpasses established attribution methods, such as RISE, ScoreCAM, and LayerCAM, as well as an alternative counterfactual explanation method introduced by Singla et al. This evidence suggests that COIN is a promising approach for semantic segmentation of tumors in CT images, and presents a step forward in making deep learning applications more accessible and effective in healthcare, where annotated data is scarce.
Paper Structure (34 sections, 11 equations, 6 figures, 3 tables)

This paper contains 34 sections, 11 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Overview of the proposed counterfactual inpainting (COIN) pipeline. Given the input image $X$ and black-box classifier $f$ that produces a classification label, the image-to-image model (GAN) generates a counterfactual image $X_{cf}$ with $y = 0$. If $X$ is abnormal, it is expected that $X_{cf}$ no longer contains the abormal part of the input image. Computing the absolute difference of the original image $X$ and counterfactual image $X_{cf}$ results in a weak tumor segmentation map. While training the pipeline, only GAN weights are updated. Classifier predictions are used for classifier consistency loss calculation.
  • Figure 2: Visualization of the attribution and the proposed counterfactual inpainting pipeline methods' predictions on TotalSegmentator and TUH datasets. For each dataset, the bottom row depicts thresholded masks obtained from saliency maps from each method. For each masks, colors represent outcomes in terms of true positive (green), false positive (red) and false negative (yellow) predictions. White masks denote ground truth labels. Images are zoomed in for better clarity.
  • Figure 3: Examples of the synthetic anomalies injected randomly inside kidneys for the TotalSegmentator dataset.
  • Figure 4: Examples of images generated with original and perturbation-based Singla et al.* pipelines.
  • Figure 5: Examples of images generated with and without skip-connections between encoder-decoder layers of the perturbation-based Singla et al.* pipeline.
  • ...and 1 more figures