Table of Contents
Fetching ...

Towards Diverse Binary Segmentation via A Simple yet General Gated Network

Xiaoqi Zhao, Youwei Pang, Lihe Zhang, Huchuan Lu, Lei Zhang

TL;DR

This work addresses the challenge of diverse binary segmentation by coupling a gating mechanism that controls cross-level encoder-to-decoder information with a Fold-ASPP module for robust multi-scale context. The GateNet architecture employs multi-level gate units in a dual-branch decoder to suppress background interference while preserving and combining high-level and fine-grained features. A folded atrous convolution (Fold-ASPP) enhances multi-scale feature extraction, and a two-stream extension enables effective RGB-D fusion. Extensive experiments across 33 datasets and 10 binary segmentation tasks demonstrate that GateNet outperforms 42 state-of-the-art methods on 10 metrics, establishing a strong general baseline. The approach offers a practical, generalizable solution for diverse binary segmentation tasks and highlights the benefits of gated inter-layer communication and local-in-local context modeling.

Abstract

In many binary segmentation tasks, most CNNs-based methods use a U-shape encoder-decoder network as their basic structure. They ignore two key problems when the encoder exchanges information with the decoder: one is the lack of interference control mechanism between them, the other is without considering the disparity of the contributions from different encoder levels. In this work, we propose a simple yet general gated network (GateNet) to tackle them all at once. With the help of multi-level gate units, the valuable context information from the encoder can be selectively transmitted to the decoder. In addition, we design a gated dual branch structure to build the cooperation among the features of different levels and improve the discrimination ability of the network. Furthermore, we introduce a "Fold" operation to improve the atrous convolution and form a novel folded atrous convolution, which can be flexibly embedded in ASPP or DenseASPP to accurately localize foreground objects of various scales. GateNet can be easily generalized to many binary segmentation tasks, including general and specific object segmentation and multi-modal segmentation. Without bells and whistles, our network consistently performs favorably against the state-of-the-art methods under 10 metrics on 33 datasets of 10 binary segmentation tasks.

Towards Diverse Binary Segmentation via A Simple yet General Gated Network

TL;DR

This work addresses the challenge of diverse binary segmentation by coupling a gating mechanism that controls cross-level encoder-to-decoder information with a Fold-ASPP module for robust multi-scale context. The GateNet architecture employs multi-level gate units in a dual-branch decoder to suppress background interference while preserving and combining high-level and fine-grained features. A folded atrous convolution (Fold-ASPP) enhances multi-scale feature extraction, and a two-stream extension enables effective RGB-D fusion. Extensive experiments across 33 datasets and 10 binary segmentation tasks demonstrate that GateNet outperforms 42 state-of-the-art methods on 10 metrics, establishing a strong general baseline. The approach offers a practical, generalizable solution for diverse binary segmentation tasks and highlights the benefits of gated inter-layer communication and local-in-local context modeling.

Abstract

In many binary segmentation tasks, most CNNs-based methods use a U-shape encoder-decoder network as their basic structure. They ignore two key problems when the encoder exchanges information with the decoder: one is the lack of interference control mechanism between them, the other is without considering the disparity of the contributions from different encoder levels. In this work, we propose a simple yet general gated network (GateNet) to tackle them all at once. With the help of multi-level gate units, the valuable context information from the encoder can be selectively transmitted to the decoder. In addition, we design a gated dual branch structure to build the cooperation among the features of different levels and improve the discrimination ability of the network. Furthermore, we introduce a "Fold" operation to improve the atrous convolution and form a novel folded atrous convolution, which can be flexibly embedded in ASPP or DenseASPP to accurately localize foreground objects of various scales. GateNet can be easily generalized to many binary segmentation tasks, including general and specific object segmentation and multi-modal segmentation. Without bells and whistles, our network consistently performs favorably against the state-of-the-art methods under 10 metrics on 33 datasets of 10 binary segmentation tasks.
Paper Structure (24 sections, 16 equations, 24 figures, 19 tables)

This paper contains 24 sections, 16 equations, 24 figures, 19 tables.

Figures (24)

  • Figure 1: Some meaningful binary segmentation tasks.
  • Figure 2: Overall architecture of the gated network. It consists of five encoder blocks ($\mathbf{E}^1 \sim \mathbf{E}^5$), five transition layers ($\mathbf{T}^1 \sim \mathbf{T}^5$), five gate units ($\mathbf{G}^1 \sim \mathbf{G}^5$), five decoder blocks ($\mathbf{D}^1 \sim \mathbf{D}^5$) and the Fold-ASPP module. We employ twice supervision in this network. One acts at the end of the FPN branch ${D}^1$. The other is used to guide the fusion of the two branches.
  • Figure 3: Detailed illustration of the gate unit. ${D}^{i+1}$ indicates feature maps of the previous decoder block. S is sigmoid function.
  • Figure 4: Architecture comparison between the Gated FPN with gate units-v1 and gate units-v2.
  • Figure 5: Illustration of different decoder architectures. (a) Progressive structure. (b) Parallel structure. (c) Dual branch structure.
  • ...and 19 more figures