Table of Contents
Fetching ...

Bilateral Reference for High-Resolution Dichotomous Image Segmentation

Peng Zheng, Dehong Gao, Deng-Ping Fan, Li Liu, Jorma Laaksonen, Wanli Ouyang, Nicu Sebe

TL;DR

This work introduces BiRefNet, a unified framework for high-resolution dichotomous image segmentation that decomposes the task into localization and reconstruction modules guided by a bilateral reference. The bilateral reference combines an inward source-image patch strategy with outward gradient supervision to preserve fine details and enhance boundary precision, complemented by targeted training strategies for DIS. Across DIS, HRSOD, COD, and SOD benchmarks, BiRefNet achieves state-of-the-art results and demonstrates strong generalization and efficiency. The approach offers practical insights for HR segmentation and provides avenues for rapid deployment and broader application in real-world scenarios.

Abstract

We introduce a novel bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS). It comprises two essential components: the localization module (LM) and the reconstruction module (RM) with our proposed bilateral reference (BiRef). The LM aids in object localization using global semantic information. Within the RM, we utilize BiRef for the reconstruction process, where hierarchical patches of images provide the source reference and gradient maps serve as the target reference. These components collaborate to generate the final predicted maps. We also introduce auxiliary gradient supervision to enhance focus on regions with finer details. Furthermore, we outline practical training strategies tailored for DIS to improve map quality and training process. To validate the general applicability of our approach, we conduct extensive experiments on four tasks to evince that BiRefNet exhibits remarkable performance, outperforming task-specific cutting-edge methods across all benchmarks. Our codes are available at https://github.com/ZhengPeng7/BiRefNet.

Bilateral Reference for High-Resolution Dichotomous Image Segmentation

TL;DR

This work introduces BiRefNet, a unified framework for high-resolution dichotomous image segmentation that decomposes the task into localization and reconstruction modules guided by a bilateral reference. The bilateral reference combines an inward source-image patch strategy with outward gradient supervision to preserve fine details and enhance boundary precision, complemented by targeted training strategies for DIS. Across DIS, HRSOD, COD, and SOD benchmarks, BiRefNet achieves state-of-the-art results and demonstrates strong generalization and efficiency. The approach offers practical insights for HR segmentation and provides avenues for rapid deployment and broader application in real-world scenarios.

Abstract

We introduce a novel bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS). It comprises two essential components: the localization module (LM) and the reconstruction module (RM) with our proposed bilateral reference (BiRef). The LM aids in object localization using global semantic information. Within the RM, we utilize BiRef for the reconstruction process, where hierarchical patches of images provide the source reference and gradient maps serve as the target reference. These components collaborate to generate the final predicted maps. We also introduce auxiliary gradient supervision to enhance focus on regions with finer details. Furthermore, we outline practical training strategies tailored for DIS to improve map quality and training process. To validate the general applicability of our approach, we conduct extensive experiments on four tasks to evince that BiRefNet exhibits remarkable performance, outperforming task-specific cutting-edge methods across all benchmarks. Our codes are available at https://github.com/ZhengPeng7/BiRefNet.
Paper Structure (20 sections, 1 equation, 9 figures, 7 tables)

This paper contains 20 sections, 1 equation, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Visual comparison between the results of our proposed BiRefNet and the latest state-of-the-art methods (e.g., IS-Net DIS5K and UDUN UDUN) for high-resolution dichotomous image segmentation (DIS). Details of segmentation are zoomed in for better display.
  • Figure 2: Comparison between our proposed BiRefNet and other existing methods for HR segmentation tasks. (a) Common framework UNet; (b) Image pyramid as input image_pyramid_2zhao2018icnet; (c) Scaled images as inward reference InSPyReNetPENet; (d) BiRefNet: patches of original images at original scales as inward reference and gradient priors as outward reference. Enc = encoder, Dec = decoder.
  • Figure 3: Pipeline of the proposed bilateral reference Network (BiRefNet).BiRefNet mainly consists of the localization module (LM) and the reconstruction module (RM) with bilateral reference (BiRef) blocks. Please refer to Sec. \ref{['sec:overview']} for details.
  • Figure 4: Pipeline of the proposed bilateral reference blocks. The source images at the original scale are combined with decoder features as the inward reference and fed into the reconstruction block, where deformable convolutions with hierarchical receptive fields are employed. The aggregated features are then used to predict the gradient maps in the outward reference. Gradient-aware features are then turned into the attention map to act on the original features.
  • Figure 5: Quantitative comparisons of the proposed BiRefNet and the best task-specific models. S-measure Smeasure is used for the comparison here. UDUN UDUN, FSPNet FSPNet, PGNet-UH PGNet, and PGNet-DH PGNet are currently the best models for the DIS, COD, HRSOD, and SOD tasks, respectively.
  • ...and 4 more figures