Table of Contents
Fetching ...

Median2Median: Zero-shot Suppression of Structured Noise in Images

Jianxu Wang, Ge Wang

TL;DR

Median2Median (M2M) tackles denoising under structured, directionally correlated noise in a zero-shot setting, eliminating the need for external training data. It constructs pseudo-independent training pairs from a single noisy image by combining directional interpolation with generalized median filtration, followed by a randomized assignment to satisfy the Noise2Noise independence assumption. Nine lightweight networks are trained in parallel on sub-image pairs, guided by a symmetric loss and a consistency loss, with a de-structured input workflow to enforce robust learning. M2M achieves competitive performance with state-of-the-art zero-shot methods under i.i.d. noise and consistently outperforms them under correlated noise, representing a data-free approach that advances zero-shot denoising beyond the strict i.i.d. assumption and enabling practical denoising in settings with structured artifacts.

Abstract

Image denoising is a fundamental problem in computer vision and medical imaging. However, real-world images are often degraded by structured noise with strong anisotropic correlations that existing methods struggle to remove. Most data-driven approaches rely on large datasets with high-quality labels and still suffer from limited generalizability, whereas existing zero-shot methods avoid this limitation but remain effective only for independent and identically distributed (i.i.d.) noise. To address this gap, we propose Median2Median (M2M), a zero-shot denoising framework designed for structured noise. M2M introduces a novel sampling strategy that generates pseudo-independent sub-image pairs from a single noisy input. This strategy leverages directional interpolation and generalized median filtering to adaptively exclude values distorted by structured artifacts. To further enlarge the effective sampling space and eliminate systematic bias, a randomized assignment strategy is employed, ensuring that the sampled sub-image pairs are suitable for Noise2Noise training. In our realistic simulation studies, M2M performs on par with state-of-the-art zero-shot methods under i.i.d. noise, while consistently outperforming them under correlated noise. These findings establish M2M as an efficient, data-free solution for structured noise suppression and mark the first step toward effective zero-shot denoising beyond the strict i.i.d. assumption.

Median2Median: Zero-shot Suppression of Structured Noise in Images

TL;DR

Median2Median (M2M) tackles denoising under structured, directionally correlated noise in a zero-shot setting, eliminating the need for external training data. It constructs pseudo-independent training pairs from a single noisy image by combining directional interpolation with generalized median filtration, followed by a randomized assignment to satisfy the Noise2Noise independence assumption. Nine lightweight networks are trained in parallel on sub-image pairs, guided by a symmetric loss and a consistency loss, with a de-structured input workflow to enforce robust learning. M2M achieves competitive performance with state-of-the-art zero-shot methods under i.i.d. noise and consistently outperforms them under correlated noise, representing a data-free approach that advances zero-shot denoising beyond the strict i.i.d. assumption and enabling practical denoising in settings with structured artifacts.

Abstract

Image denoising is a fundamental problem in computer vision and medical imaging. However, real-world images are often degraded by structured noise with strong anisotropic correlations that existing methods struggle to remove. Most data-driven approaches rely on large datasets with high-quality labels and still suffer from limited generalizability, whereas existing zero-shot methods avoid this limitation but remain effective only for independent and identically distributed (i.i.d.) noise. To address this gap, we propose Median2Median (M2M), a zero-shot denoising framework designed for structured noise. M2M introduces a novel sampling strategy that generates pseudo-independent sub-image pairs from a single noisy input. This strategy leverages directional interpolation and generalized median filtering to adaptively exclude values distorted by structured artifacts. To further enlarge the effective sampling space and eliminate systematic bias, a randomized assignment strategy is employed, ensuring that the sampled sub-image pairs are suitable for Noise2Noise training. In our realistic simulation studies, M2M performs on par with state-of-the-art zero-shot methods under i.i.d. noise, while consistently outperforming them under correlated noise. These findings establish M2M as an efficient, data-free solution for structured noise suppression and mark the first step toward effective zero-shot denoising beyond the strict i.i.d. assumption.

Paper Structure

This paper contains 14 sections, 24 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Overview of the proposed M2M framework. The top box illustrates M2M sampling at the center position in the first patch, where directional interpolation and median filtering are applied to obtain de-structured pixel values from an image corrupted by structured noise. Below the top box are the training and inference pipelines, where the solid arrows indicate the training process and the dashed arrows denote the inference process. The bottom right panel shows the sampling space ($3^N$), the training set ($2n$), and the testing set ($2k$).
  • Figure 2: Sampling and interpolation processes. (a) Comparison of pixel-wise and block-wise sampling strategies: pixel-wise sampling moves a $3 \times 3$ window with a stride $1$ thus a resultant overlap, whereas block-wise sampling partitions the image into non-overlapping $3 \times 3$ patches, each with nine predefined sampling positions—top-left (TL), top (T), top-right (TR), left (L), center (C), right (R), bottom-left (BL), bottom (B), and bottom-right (BR). (b) A $3 \times 3$ interpolation window used for directional interpolation, illustrated with the center sampling position as an example. Zero-order interpolation directly takes values from neighboring pixels, while first-order interpolation averages symmetric neighbors along each of the four predefined directions ($0^\circ$, $45^\circ$, $90^\circ$, and $135^\circ$).
  • Figure 3: Sampling results from a fluorescence microscopic image corrupted by horizontally correlated noise. From left to right: the clean reference, the noisy input, sampling with N2F using checkerboard sampling followed by a squeeze-left operation, sampling with ZS-N2N using $2 \times 2$ diagonal averaging, and sampling with M2M using zero-order interpolation with four-neighborhood, zero-order interpolation with eight-neighborhood, and first-order interpolation. The second row shows the residuals relative to their clean counterparts. It can be observed that N2F and ZS-N2N retain strong horizontal structures inherited from the correlated noise in their sampling results, whereas M2M sampling effectively suppresses such directional interference.
  • Figure 4: Three-layer convolutional neural network denoiser. The first two layers employ $3\times 3$ convolutions followed by PReLU, while the final layer is a $1\times 1$ convolution that produces the denoised output. The image dimensions at each stage are indicated below the arrows.
  • Figure 5: Visual comparison of representative results obtained for synthetic one-dimensionally correlated noise. (a) A Kodak image with vertical correlated noise ($\sigma_n = 0.15$, $\ell = 3$). (b) A fluorescence microscopic image with horizontal correlated noise ($\sigma_n = 0.15$, $\ell = 5$). For M2M, “M2M-1” denotes the version using first-order interpolation, whereas “M2M-0 (4N)” and “M2M-0 (8N)” denote the versions using zero-order interpolation with four- and eight-neighbors, respectively. The power spectral density (PSD) of the synthetic noise is shown in the bottom-left corner of each panel, highlighting the directional correlation introduced by structured noise. Quantitative results are reported as PSNR (dB)/SSIM.
  • ...and 1 more figures