Table of Contents
Fetching ...

TM-BSN: Triangular-Masked Blind-Spot Network for Real-World Self-Supervised Image Denoising

Junyoung Park, Youngjin Oh, Nam Ik Cho

Abstract

Blind-spot networks (BSNs) enable self-supervised image denoising by preventing access to the target pixel, allowing clean signal estimation without ground-truth supervision. However, this approach assumes pixel-wise noise independence, which is violated in real-world sRGB images due to spatially correlated noise from the camera's image signal processing (ISP) pipeline. While several methods employ downsampling to decorrelate noise, they alter noise statistics and limit the network's ability to utilize full contextual information. In this paper, we propose the Triangular-Masked Blind-Spot Network (TM-BSN), a novel blind-spot architecture that accurately models the spatial correlation of real sRGB noise. This correlation originates from demosaicing, where each pixel is reconstructed from neighboring samples with spatially decaying weights, resulting in a diamond-shaped pattern. To align the receptive field with this geometry, we introduce a triangular-masked convolution that restricts the kernel to its upper-triangular region, creating a diamond-shaped blind spot at the original resolution. This design excludes correlated pixels while fully leveraging uncorrelated context, eliminating the need for downsampling or post-processing. Furthermore, we use knowledge distillation to transfer complementary knowledge from multiple blind-spot predictions into a lightweight U-Net, improving both accuracy and efficiency. Extensive experiments on real-world benchmarks demonstrate that our method achieves state-of-the-art performance, significantly outperforming existing self-supervised approaches. Our code is available at https://github.com/parkjun210/TM-BSN.

TM-BSN: Triangular-Masked Blind-Spot Network for Real-World Self-Supervised Image Denoising

Abstract

Blind-spot networks (BSNs) enable self-supervised image denoising by preventing access to the target pixel, allowing clean signal estimation without ground-truth supervision. However, this approach assumes pixel-wise noise independence, which is violated in real-world sRGB images due to spatially correlated noise from the camera's image signal processing (ISP) pipeline. While several methods employ downsampling to decorrelate noise, they alter noise statistics and limit the network's ability to utilize full contextual information. In this paper, we propose the Triangular-Masked Blind-Spot Network (TM-BSN), a novel blind-spot architecture that accurately models the spatial correlation of real sRGB noise. This correlation originates from demosaicing, where each pixel is reconstructed from neighboring samples with spatially decaying weights, resulting in a diamond-shaped pattern. To align the receptive field with this geometry, we introduce a triangular-masked convolution that restricts the kernel to its upper-triangular region, creating a diamond-shaped blind spot at the original resolution. This design excludes correlated pixels while fully leveraging uncorrelated context, eliminating the need for downsampling or post-processing. Furthermore, we use knowledge distillation to transfer complementary knowledge from multiple blind-spot predictions into a lightweight U-Net, improving both accuracy and efficiency. Extensive experiments on real-world benchmarks demonstrate that our method achieves state-of-the-art performance, significantly outperforming existing self-supervised approaches. Our code is available at https://github.com/parkjun210/TM-BSN.

Paper Structure

This paper contains 14 sections, 3 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Visualization of Effective Receptive Fields (ERFs) of different Blind-Spot Networks (BSNs): (a) AP-BSN, (b) LG-BPN, (c) AT-BSN, and (d) the proposed TM-BSN. The ERFs are computed by backpropagating the gradients from the central output pixel with respect to the input pixels erf.
  • Figure 2: (a) The demosaicing filter assigns higher weights to spatially closer samples during color reconstruction. (b) Real noise exhibits a diamond-shaped correlation pattern with respect to relative distance.
  • Figure 3: Illustration of receptive field expansion and blind-spot formation using the proposed triangular-masked convolution. The receptive field progressively expands in a triangular shape through stacked masked convolutions, and a feature-map shift is subsequently applied to introduce a central blind-spot.
  • Figure 4: Overview of the proposed Triangular-Masked Blind-Spot Network (TM-BSN) architecture. R rotates the input image by $0^\circ$, $90^\circ$, $180^\circ$, and $270^\circ$ and extracts features through TM-BSN; S applies feature-map shifts by a given offset $s$ to generate blind spots; and R$^{-1}$ unrotates the four branches, concatenates them along the channel dimension, and applies a $1{\times}1$ convolution to produce the final output.
  • Figure 5: Qualitative comparison on SIDD Validation dataset sidd.
  • ...and 3 more figures