Table of Contents
Fetching ...

Tiling artifacts and trade-offs of feature normalization in the segmentation of large biological images

Elena Buglakova, Anwai Archit, Edoardo D'Imprima, Julia Mahamid, Constantin Pape, Anna Kreshuk

TL;DR

This work identifies tiling artifacts in sliding-window segmentation of very large biological images as arising from tile-wise feature normalization. It introduces two practical indicators—tile mismatch and train/eval disparity—to diagnose normalization issues and evaluates several normalization strategies across CNN and ViT-based architectures. The study finds that tile-wise normalization (InstanceNorm) yields artifacts, while global normalization (BatchNorm) avoids stitching artifacts but suffers from train/eval disparity; BatchRenorm provides artifact-free stitching without sacrificing accuracy, including in transfer tasks. These findings offer actionable guidance for deploying scalable segmentation models on large biological datasets, improving both stitching quality and cross-dataset reuse.

Abstract

Segmentation of very large images is a common problem in microscopy, medical imaging or remote sensing. The problem is usually addressed by sliding window inference, which can theoretically lead to seamlessly stitched predictions. However, in practice many of the popular pipelines still suffer from tiling artifacts. We investigate the root cause of these issues and show that they stem from the normalization layers within the neural networks. We propose indicators to detect normalization issues and further explore the trade-offs between artifact-free and high-quality predictions, using three diverse microscopy datasets as examples. Finally, we propose to use BatchRenorm as the most suitable normalization strategy, which effectively removes tiling artifacts and enhances transfer performance, thereby improving the reusability of trained networks for new datasets.

Tiling artifacts and trade-offs of feature normalization in the segmentation of large biological images

TL;DR

This work identifies tiling artifacts in sliding-window segmentation of very large biological images as arising from tile-wise feature normalization. It introduces two practical indicators—tile mismatch and train/eval disparity—to diagnose normalization issues and evaluates several normalization strategies across CNN and ViT-based architectures. The study finds that tile-wise normalization (InstanceNorm) yields artifacts, while global normalization (BatchNorm) avoids stitching artifacts but suffers from train/eval disparity; BatchRenorm provides artifact-free stitching without sacrificing accuracy, including in transfer tasks. These findings offer actionable guidance for deploying scalable segmentation models on large biological datasets, improving both stitching quality and cross-dataset reuse.

Abstract

Segmentation of very large images is a common problem in microscopy, medical imaging or remote sensing. The problem is usually addressed by sliding window inference, which can theoretically lead to seamlessly stitched predictions. However, in practice many of the popular pipelines still suffer from tiling artifacts. We investigate the root cause of these issues and show that they stem from the normalization layers within the neural networks. We propose indicators to detect normalization issues and further explore the trade-offs between artifact-free and high-quality predictions, using three diverse microscopy datasets as examples. Finally, we propose to use BatchRenorm as the most suitable normalization strategy, which effectively removes tiling artifacts and enhances transfer performance, thereby improving the reusability of trained networks for new datasets.

Paper Structure

This paper contains 19 sections, 1 equation, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Illustration of the pipeline for processing images larger than GPU memory: random sampling of tiles during training and sliding window inference. Example of artifacts caused by tiling: hallucinations in the low signal areas and discontinuous predictions at the tile borders.
  • Figure 2: Examples of predictions with and without tiling artifacts
  • Figure 3: Illustration of the tile mismatch metric.
  • Figure 4: Receptive field, shown as Log10 of mean gradient of the central pixel output with respect to the input
  • Figure 5: Tiling artifacts become more pronounced in a transfer setting. For quantitative evaluation see STab. 1.
  • ...and 2 more figures