HisynSeg: Weakly-Supervised Histopathological Image Segmentation via Image-Mixing Synthesis and Consistency Regularization

Zijie Fang; Yifeng Wang; Peizhang Xie; Zhi Wang; Yongbing Zhang

HisynSeg: Weakly-Supervised Histopathological Image Segmentation via Image-Mixing Synthesis and Consistency Regularization

Zijie Fang, Yifeng Wang, Peizhang Xie, Zhi Wang, Yongbing Zhang

TL;DR

This work tackles the challenge of pixel-level tissue segmentation under weak supervision by addressing CAM limitations, particularly under- and over-activation. It proposes HisynSeg, a framework that converts weakly-supervised learning into a fully-supervised regime through image-mixing synthesis (Mosaic and Bézier masks), a synthesized-image filtering module to ensure realism, and a self-supervised consistency mechanism that leverages real images without masks. The training objective combines segmentation, consistency, and classification losses, enabling real and synthesized data to jointly guide learning. Across three histopathology datasets, HisynSeg achieves state-of-the-art performance, demonstrating robustness and a substantial reduction in annotation burden for clinical deployment.

Abstract

Tissue semantic segmentation is one of the key tasks in computational pathology. To avoid the expensive and laborious acquisition of pixel-level annotations, a wide range of studies attempt to adopt the class activation map (CAM), a weakly-supervised learning scheme, to achieve pixel-level tissue segmentation. However, CAM-based methods are prone to suffer from under-activation and over-activation issues, leading to poor segmentation performance. To address this problem, we propose a novel weakly-supervised semantic segmentation framework for histopathological images based on image-mixing synthesis and consistency regularization, dubbed HisynSeg. Specifically, synthesized histopathological images with pixel-level masks are generated for fully-supervised model training, where two synthesis strategies are proposed based on Mosaic transformation and Bézier mask generation. Besides, an image filtering module is developed to guarantee the authenticity of the synthesized images. In order to further avoid the model overfitting to the occasional synthesis artifacts, we additionally propose a novel self-supervised consistency regularization, which enables the real images without segmentation masks to supervise the training of the segmentation model. By integrating the proposed techniques, the HisynSeg framework successfully transforms the weakly-supervised semantic segmentation problem into a fully-supervised one, greatly improving the segmentation accuracy. Experimental results on three datasets prove that the proposed method achieves a state-of-the-art performance. Code is available at https://github.com/Vison307/HisynSeg.

HisynSeg: Weakly-Supervised Histopathological Image Segmentation via Image-Mixing Synthesis and Consistency Regularization

TL;DR

Abstract

Paper Structure (38 sections, 12 equations, 11 figures, 17 tables)

This paper contains 38 sections, 12 equations, 11 figures, 17 tables.

Introduction
Related Works
Weakly-Supervised Semantic Segmentation
WSSS for Natural Images
WSSS for Histopathological Images
Image Synthesis in Computer Vision
Method
Image-Mixing Synthesis Module
Image-Mixing Synthesis with Mosaic Transformation
Image-Mixing Synthesis with Bézier Mask Generation
Synthesized Image Filtering Module
Segmentation with Consistency Regularization
Iterative Training Strategy
Experiments and Results
Experiment Dataset
...and 23 more sections

Figures (11)

Figure 1: Examples of histopathological images and pseudo-masks generated by CAM. For comparison, ground-truth masks are also provided. Nearest and bilinear represent the utilized interpolation method. Blue and red circles highlight the under-activated and over-activated regions, respectively. Black arrows indicate the introduced noise caused by interpolation. Red pixels represent tumor epithelial and green pixels stand for necrosis. These images are from the BCSS dataset amgad2019structured.
Figure 2: A comparison of image synthesis between natural and histopathological images. Compared with natural images, histopathological images are easier to synthesize due to more uniform colors, milder foreground and background differences, and homogeneity in image content. The natural images are taken from the ImageNet dataset deng2009imagenet. The histopathological images are from the WSSS4LUAD dataset hanwsssluad.
Figure 3: An overview of the HisynSeg framework. The framework is composed of three modules. In the image-mixing synthesis module, synthesized images and masks are generated by two proposed strategies, namely Mosaic transformation and Bézier mask generation. Next, the synthesized image filtering module selects authentic images from the synthesized images. Finally, a segmentation model is trained in the histopathological image segmentation module. To avoid the occasional artifacts in the synthesized images affecting the performance of the segmentation model, real images are fed into the model, which is trained using a proposed consistency regularization under a self-supervised scheme.
Figure 4: The segmentation boundaries between different types of tissues for (a) a real image, (b) a synthesized image by Mosaic transformation, and (c) a synthesized image by Bézier mask generation. The red lines/curves in the images represent the segmentation boundaries.
Figure 5: The architecture of the discriminator, which is based on the backbone of ResNet-18. The last fully connected layer (i.e., $1\times 1$ convolution) of our discriminator is modified to output a 2D vector for real and fake image classification. The kernel shapes and strides of the first convolution in each layer are listed above each block. The output shapes are noted below each block. GAP: Global Average Pooling.
...and 6 more figures

HisynSeg: Weakly-Supervised Histopathological Image Segmentation via Image-Mixing Synthesis and Consistency Regularization

TL;DR

Abstract

HisynSeg: Weakly-Supervised Histopathological Image Segmentation via Image-Mixing Synthesis and Consistency Regularization

Authors

TL;DR

Abstract

Table of Contents

Figures (11)