Table of Contents
Fetching ...

Slicer Networks

Hang Zhang, Xiang Chen, Rongguang Wang, Renjiu Hu, Dongdong Liu, Gaolei Li

TL;DR

The paper presents the Slicer Network, a two-branch architecture that couples a standard encoder with a differentiable, learnable cross-bilateral grid to perform edge-preserving, low-frequency upsampling. By replacing conventional upsampling with a splatting-blurring-slicing pipeline guided by a learned map, the network expands the effective receptive field while preserving object boundaries, reducing computation in piecewise-smooth medical images. The method is validated on three tasks—unsupervised cardiac cine-MRI registration, keypoints-based lung CT registration, and dermoscopy skin lesion segmentation—demonstrating improved accuracy and efficiency, including zero-shot capabilities for keypoints. These results suggest broad applicability for medical image analysis tasks that exhibit piecewise smooth structure and boundary detail, with potential for further improvements via adaptive filtering and guidance-map learning.

Abstract

In medical imaging, scans often reveal objects with varied contrasts but consistent internal intensities or textures. This characteristic enables the use of low-frequency approximations for tasks such as segmentation and deformation field estimation. Yet, integrating this concept into neural network architectures for medical image analysis remains underexplored. In this paper, we propose the Slicer Network, a novel architecture designed to leverage these traits. Comprising an encoder utilizing models like vision transformers for feature extraction and a slicer employing a learnable bilateral grid, the Slicer Network strategically refines and upsamples feature maps via a splatting-blurring-slicing process. This introduces an edge-preserving low-frequency approximation for the network outcome, effectively enlarging the effective receptive field. The enhancement not only reduces computational complexity but also boosts overall performance. Experiments across different medical imaging applications, including unsupervised and keypoints-based image registration and lesion segmentation, have verified the Slicer Network's improved accuracy and efficiency.

Slicer Networks

TL;DR

The paper presents the Slicer Network, a two-branch architecture that couples a standard encoder with a differentiable, learnable cross-bilateral grid to perform edge-preserving, low-frequency upsampling. By replacing conventional upsampling with a splatting-blurring-slicing pipeline guided by a learned map, the network expands the effective receptive field while preserving object boundaries, reducing computation in piecewise-smooth medical images. The method is validated on three tasks—unsupervised cardiac cine-MRI registration, keypoints-based lung CT registration, and dermoscopy skin lesion segmentation—demonstrating improved accuracy and efficiency, including zero-shot capabilities for keypoints. These results suggest broad applicability for medical image analysis tasks that exhibit piecewise smooth structure and boundary detail, with potential for further improvements via adaptive filtering and guidance-map learning.

Abstract

In medical imaging, scans often reveal objects with varied contrasts but consistent internal intensities or textures. This characteristic enables the use of low-frequency approximations for tasks such as segmentation and deformation field estimation. Yet, integrating this concept into neural network architectures for medical image analysis remains underexplored. In this paper, we propose the Slicer Network, a novel architecture designed to leverage these traits. Comprising an encoder utilizing models like vision transformers for feature extraction and a slicer employing a learnable bilateral grid, the Slicer Network strategically refines and upsamples feature maps via a splatting-blurring-slicing process. This introduces an edge-preserving low-frequency approximation for the network outcome, effectively enlarging the effective receptive field. The enhancement not only reduces computational complexity but also boosts overall performance. Experiments across different medical imaging applications, including unsupervised and keypoints-based image registration and lesion segmentation, have verified the Slicer Network's improved accuracy and efficiency.
Paper Structure (15 sections, 4 equations, 8 figures, 3 tables)

This paper contains 15 sections, 4 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Effective Receptive Field (ERF) visualizations luo2016understanding across architectures: U-Net (a, d, g, j), Swin-Unet (b), Swin-Slicer (c). Darker and more widely spread regions indicate larger ERFs. Swin-Unet and Swin-Slicer feature maps are presented pre-softmax, while Unet utilizes decoder feature maps, with L4 to L1 showing increased spatial sizes via upsampling. Our slicer network enhances the ERF compared to standard Unet layers.
  • Figure 2: Visual illustration of the traditional tri-phase bilateral grid process comprising splatting, blurring, and slicing on a 1D signal. Splatting: Employs a nearest sampling kernel to transform image signals, normalized to (0,1) from a dimension of $1 \times 128$ into a $16 \times 10$ sparse spatial-range grid using sampling rates $s_s=8$ and $s_r=0.1$. Blurring: Applies Gaussian filters with $\sigma=1$ to both spatial and range dimensions, corresponding to $\sigma_s=\sigma \times s_s$ and $\sigma_r=\sigma \times s_r$ in the initial image space. Slicing: Utilizes a bilinear sampling kernel, normalized via homogeneous coordinates. (a) Depicts the bilateral grid aligned with the input 1D signal, with orange dots showing projected intensities linked by a blue line. (b) Exhibits the Gaussian-blurred grid and filtered output 1D signal; orange dots are sliced intensities, contrasted against the original blue line using a dashed red line.
  • Figure 3: Overview of the Slicer Network framework. The raw image generates a guidance map and an encoder-derived feature map, approximating low-frequency segmentation context. The slicer then refines and upsamples this map, leading to the final segmentation logits via an output head.
  • Figure 4: Visual illustration of the Slicer. Details on the splatting and slicing processes are discussed in Section \ref{['sec:splatting_slicing']}. The network's reducer, guidance mapping, and blurring components consist of convolutional layers.
  • Figure 5: Trade-off between average Dice (%) and computational complexity for the ACDC dataset, comparing parameter size and multi-add operations (in G) on a log-scaled x-axis. TransMorph is shown in three complexities: tiny, small, and normal. Complexity for FourierNet, LKU-Net, and Res-Slicer is adjusted by varying the network's initial channel count.
  • ...and 3 more figures