Table of Contents
Fetching ...

Multi-Scale Denoising in the Feature Space for Low-Light Instance Segmentation

Joanne Lin, Nantheera Anantrasirichai, David Bull

TL;DR

The paper tackles the problem of instance segmentation in low-light conditions by integrating weighted non-local blocks (wNLB) into backbone feature extractors to perform denoising in the feature space, enabling end-to-end training without aligned ground-truth data. wNLB introduces a learnable weight per layer to adapt denoising to multi-scale noise, described by the operation $z = w W_z y + (1-w) x$. Trained on a synthetic low-light COCO dataset, the approach yields consistent improvements over pretrained and finetuned baselines across multiple detectors, and demonstrates effectiveness on real low-light data, suggesting broad applicability to low-light vision tasks. The work highlights that end-to-end denoising in the feature space can outperform two-stage LLIE-plus-detection pipelines and offers a practical path toward robust low-light instance segmentation in real-world scenarios.

Abstract

Instance segmentation for low-light imagery remains largely unexplored due to the challenges imposed by such conditions, for example shot noise due to low photon count, color distortions and reduced contrast. In this paper, we propose an end-to-end solution to address this challenging task. Our proposed method implements weighted non-local blocks (wNLB) in the feature extractor. This integration enables an inherent denoising process at the feature level. As a result, our method eliminates the need for aligned ground truth images during training, thus supporting training on real-world low-light datasets. We introduce additional learnable weights at each layer in order to enhance the network's adaptability to real-world noise characteristics, which affect different feature scales in different ways. Experimental results on several object detectors show that the proposed method outperforms the pretrained networks with an Average Precision (AP) improvement of at least +7.6, with the introduction of wNLB further enhancing AP by upto +1.3.

Multi-Scale Denoising in the Feature Space for Low-Light Instance Segmentation

TL;DR

The paper tackles the problem of instance segmentation in low-light conditions by integrating weighted non-local blocks (wNLB) into backbone feature extractors to perform denoising in the feature space, enabling end-to-end training without aligned ground-truth data. wNLB introduces a learnable weight per layer to adapt denoising to multi-scale noise, described by the operation . Trained on a synthetic low-light COCO dataset, the approach yields consistent improvements over pretrained and finetuned baselines across multiple detectors, and demonstrates effectiveness on real low-light data, suggesting broad applicability to low-light vision tasks. The work highlights that end-to-end denoising in the feature space can outperform two-stage LLIE-plus-detection pipelines and offers a practical path toward robust low-light instance segmentation in real-world scenarios.

Abstract

Instance segmentation for low-light imagery remains largely unexplored due to the challenges imposed by such conditions, for example shot noise due to low photon count, color distortions and reduced contrast. In this paper, we propose an end-to-end solution to address this challenging task. Our proposed method implements weighted non-local blocks (wNLB) in the feature extractor. This integration enables an inherent denoising process at the feature level. As a result, our method eliminates the need for aligned ground truth images during training, thus supporting training on real-world low-light datasets. We introduce additional learnable weights at each layer in order to enhance the network's adaptability to real-world noise characteristics, which affect different feature scales in different ways. Experimental results on several object detectors show that the proposed method outperforms the pretrained networks with an Average Precision (AP) improvement of at least +7.6, with the introduction of wNLB further enhancing AP by upto +1.3.
Paper Structure (12 sections, 1 equation, 7 figures, 3 tables)

This paper contains 12 sections, 1 equation, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Visual comparison of low-light instance segmentation for different methods using Mask R-CNN he2017maskrcnn as the detector.
  • Figure 2: Generic architecture showing our proposed weighted non-local blocks added into the backbone to remove noise in the feature space. Blue blocks indicate convolutional layers.
  • Figure 3: Our proposed weighted non-local block (wNLB) for feature denoising with learnable weight $w$.
  • Figure 4: Comparison of shallow and deep features extracted from our proposed method against pre-trained and finetuned Mask R-CNN he2017maskrcnn models.
  • Figure 5: Visual comparison of our proposed method against pre-trained and finetuned Mask R-CNN he2017maskrcnn models, along with the ground truth, for cases of varying levels of difficulty.
  • ...and 2 more figures