Table of Contents
Fetching ...

ContourFormer: Real-Time Contour-Based End-to-End Instance Segmentation Transformer

Weiwei Yao, Chen Li, Minjun Xiong, Wenbo Dong, Hao Chen, Xiong Xiao

TL;DR

Contourformer addresses the need for fast and precise contour-based instance segmentation by building on the DETR framework and introducing iterative contour refinement. It introduces two core innovations: sub-contour decoupling, which decomposes contours into manageable local parts and reduces self-attention complexity, and contour fine-grained distribution refinement (CFDR), which models boundary uncertainty with multi-layer probabilistic distributions and residual updates. The method achieves strong performance on SBD, COCO, and KINS while maintaining real-time speeds, outperforming existing contour-based approaches and approaching or exceeding mask-based baselines in some settings. Its end-to-end, NMS-free design and probabilistic contour modeling offer a practical and scalable baseline for contour-based instance segmentation with potential for further improvements and real-time applications.

Abstract

This paper presents Contourformer, a real-time contour-based instance segmentation algorithm. The method is fully based on the DETR paradigm and achieves end-to-end inference through iterative and progressive mechanisms to optimize contours. To improve efficiency and accuracy, we develop two novel techniques: sub-contour decoupling mechanisms and contour fine-grained distribution refinement. In the sub-contour decoupling mechanism, we propose a deformable attention-based module that adaptively selects sampling regions based on the current predicted contour, enabling more effective capturing of object boundary information. Additionally, we design a multi-stage optimization process to enhance segmentation precision by progressively refining sub-contours. The contour fine-grained distribution refinement technique aims to further improve the ability to express fine details of contours. These innovations enable Contourformer to achieve stable and precise segmentation for each instance while maintaining real-time performance. Extensive experiments demonstrate the superior performance of Contourformer on multiple benchmark datasets, including SBD, COCO, and KINS. We conduct comprehensive evaluations and comparisons with existing state-of-the-art methods, showing significant improvements in both accuracy and inference speed. This work provides a new solution for contour-based instance segmentation tasks and lays a foundation for future research, with the potential to become a strong baseline method in this field.

ContourFormer: Real-Time Contour-Based End-to-End Instance Segmentation Transformer

TL;DR

Contourformer addresses the need for fast and precise contour-based instance segmentation by building on the DETR framework and introducing iterative contour refinement. It introduces two core innovations: sub-contour decoupling, which decomposes contours into manageable local parts and reduces self-attention complexity, and contour fine-grained distribution refinement (CFDR), which models boundary uncertainty with multi-layer probabilistic distributions and residual updates. The method achieves strong performance on SBD, COCO, and KINS while maintaining real-time speeds, outperforming existing contour-based approaches and approaching or exceeding mask-based baselines in some settings. Its end-to-end, NMS-free design and probabilistic contour modeling offer a practical and scalable baseline for contour-based instance segmentation with potential for further improvements and real-time applications.

Abstract

This paper presents Contourformer, a real-time contour-based instance segmentation algorithm. The method is fully based on the DETR paradigm and achieves end-to-end inference through iterative and progressive mechanisms to optimize contours. To improve efficiency and accuracy, we develop two novel techniques: sub-contour decoupling mechanisms and contour fine-grained distribution refinement. In the sub-contour decoupling mechanism, we propose a deformable attention-based module that adaptively selects sampling regions based on the current predicted contour, enabling more effective capturing of object boundary information. Additionally, we design a multi-stage optimization process to enhance segmentation precision by progressively refining sub-contours. The contour fine-grained distribution refinement technique aims to further improve the ability to express fine details of contours. These innovations enable Contourformer to achieve stable and precise segmentation for each instance while maintaining real-time performance. Extensive experiments demonstrate the superior performance of Contourformer on multiple benchmark datasets, including SBD, COCO, and KINS. We conduct comprehensive evaluations and comparisons with existing state-of-the-art methods, showing significant improvements in both accuracy and inference speed. This work provides a new solution for contour-based instance segmentation tasks and lays a foundation for future research, with the potential to become a strong baseline method in this field.

Paper Structure

This paper contains 14 sections, 5 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Overview of Contourformer: Given an image, multi-scale features are collected through the Backbone and Encoder. Initial bounding boxes and Query features for each object are proposed. A simple ellipse is used to initialize a polygon contour for each target, which is then divided into eight sub-contours. Corresponding bounding boxes are created for these sub-contours. N stacked Transformer decoder layers iteratively refine each sub-contour, with the bounding box of each sub-contour providing the feature extraction range for cross-attention. The FDR head provides fine-grained distributions for each boundary point during iterations. The network employs a denoising module to accelerate training and enhance accuracy.
  • Figure 2: Sub-contour decoupling mechanism: Each decoder layer uses self-attention and cross-attention to update the queries. It performs two rounds of self-attention, first among $\left\{q_{j}^{\mathrm{c}}\right\}_{j=0}^{N_{c}-1}$ and then among $\left\{q_{i}^{\mathrm{ins}}\right\}_{i=0}^{N_{q}-1}$. During cross-attention, the bounding box $\left\{x, y, w, h\right\}$ corresponding to the sub-contour $\left\{v_{i}\right\}_{i=0}^{N_{s}-1}$ predicted in the previous layer is used as the range for feature sampling and query updating.
  • Figure 3: Contour Fine-Grained Distribution Refinement (CFDR): The first decoder layer predicts the initial contour and preliminary probability distribution using a conventional regression head and a CFDR head. Subsequently, each subsequent layer employs residual adjustments to update the probability distributions, resulting in more precise boundary localization.
  • Figure 4: Qualitative results of Contourformer on the SBD val set
  • Figure 5: Qualitative results of Contourformer on the COCO val set
  • ...and 1 more figures