Table of Contents
Fetching ...

XNet v2: Fewer Limitations, Better Results and Greater Universality

Yanfeng Zhou, Lingrui Li, Zichen Wang, Guole Liu, Ziwen Liu, Ge Yang

TL;DR

XNet v2 achieves state-of-the-art in semi-supervised segmentation while maintaining competitive results in fully-supervised learning, and excels in scenarios where XNet fails.

Abstract

XNet introduces a wavelet-based X-shaped unified architecture for fully- and semi-supervised biomedical segmentation. So far, however, XNet still faces the limitations, including performance degradation when images lack high-frequency (HF) information, underutilization of raw images and insufficient fusion. To address these issues, we propose XNet v2, a low- and high-frequency complementary model. XNet v2 performs wavelet-based image-level complementary fusion, using fusion results along with raw images inputs three different sub-networks to construct consistency loss. Furthermore, we introduce a feature-level fusion module to enhance the transfer of low-frequency (LF) information and HF information. XNet v2 achieves state-of-the-art in semi-supervised segmentation while maintaining competitve results in fully-supervised learning. More importantly, XNet v2 excels in scenarios where XNet fails. Compared to XNet, XNet v2 exhibits fewer limitations, better results and greater universality. Extensive experiments on three 2D and two 3D datasets demonstrate the effectiveness of XNet v2. Code is available at https://github.com/Yanfeng-Zhou/XNetv2 .

XNet v2: Fewer Limitations, Better Results and Greater Universality

TL;DR

XNet v2 achieves state-of-the-art in semi-supervised segmentation while maintaining competitive results in fully-supervised learning, and excels in scenarios where XNet fails.

Abstract

XNet introduces a wavelet-based X-shaped unified architecture for fully- and semi-supervised biomedical segmentation. So far, however, XNet still faces the limitations, including performance degradation when images lack high-frequency (HF) information, underutilization of raw images and insufficient fusion. To address these issues, we propose XNet v2, a low- and high-frequency complementary model. XNet v2 performs wavelet-based image-level complementary fusion, using fusion results along with raw images inputs three different sub-networks to construct consistency loss. Furthermore, we introduce a feature-level fusion module to enhance the transfer of low-frequency (LF) information and HF information. XNet v2 achieves state-of-the-art in semi-supervised segmentation while maintaining competitve results in fully-supervised learning. More importantly, XNet v2 excels in scenarios where XNet fails. Compared to XNet, XNet v2 exhibits fewer limitations, better results and greater universality. Extensive experiments on three 2D and two 3D datasets demonstrate the effectiveness of XNet v2. Code is available at https://github.com/Yanfeng-Zhou/XNetv2 .
Paper Structure (12 sections, 6 equations, 3 figures, 11 tables)

This paper contains 12 sections, 6 equations, 3 figures, 11 tables.

Figures (3)

  • Figure 1: Comparison of CAM of HF encoder and qualitative show of image-level complementary fusion on CREMI (first row) and ISIC-2017 (second row). (a) Raw image. (b) Ground truth. (c) CAM for the first layer. (d) CAM for the second layer. (e) LF image $I_L$ ($\alpha=0.0$). (f) $x^L$ ($\alpha=0.2$). (g) $x^L$ ($\alpha=0.8$). (h) HF image $I_H$ ($\beta=0.0$). (i) $x^H$ ($\beta=0.2$). (j) $x^H$ ($\beta=0.8$).
  • Figure 2: Overview of XNet v2. XNet v2 consists of main network $M$, LF network $L$ and HF network $H$, and uses raw image $x_i^M$, LF complementary fusion image $x_i^L$ and HF complementary fusion image $x_i^H$ as input. XNet v2 learns from unlabeled images by minimizing $L_{unsup}^{M,L}$, $L_{unsup}^{M,H}$, and learns from labeled images by minimizing $L_{sup}^M$, $L_{sup}^L$, $L_{sup}^H$.
  • Figure 3: Taking the $n$-$th$ layer features of $M$ and $L$ as an example, visualize the structure of fusion module.