Table of Contents
Fetching ...

MaskUno: Switch-Split Block For Enhancing Instance Segmentation

Jawad Haidar, Marc Mouawad, Imad Elhajj, Daniel Asmar

TL;DR

MaskUno tackles the competing-kernels problem in multi-class instance segmentation by introducing a Switch-Split block that delegates each ROI to a dedicated, per-class mask head after refined bounding boxes. This decouples learning across classes and can be embedded into Mask-RCNN, Cascade Mask-RCNN, and HTC, yielding systematic mAP gains on COCO, including a 2.03% improvement for DetectoRS on 80 classes and up to 4.8% for Mask-RCNN. Losses are defined per class ($L_{cls}$, $L1$, and $L_{mask}$ with per-class $L_{mask_i}$) to ensure independent optimization of each mask head. The approach is versatile, demonstrated across multiple architectures, and holds potential for further gains with transformer-based backbones and cross-block refinements, advancing practical instance segmentation performance.

Abstract

Instance segmentation is an advanced form of image segmentation which, beyond traditional segmentation, requires identifying individual instances of repeating objects in a scene. Mask R-CNN is the most common architecture for instance segmentation, and improvements to this architecture include steps such as benefiting from bounding box refinements, adding semantics, or backbone enhancements. In all the proposed variations to date, the problem of competing kernels (each class aims to maximize its own accuracy) persists when models try to synchronously learn numerous classes. In this paper, we propose mitigating this problem by replacing mask prediction with a Switch-Split block that processes refined ROIs, classifies them, and assigns them to specialized mask predictors. We name the method MaskUno and test it on various models from the literature, which are then trained on multiple classes using the benchmark COCO dataset. An increase in the mean Average Precision (mAP) of 2.03% was observed for the high-performing DetectoRS when trained on 80 classes. MaskUno proved to enhance the mAP of instance segmentation models regardless of the number and typ

MaskUno: Switch-Split Block For Enhancing Instance Segmentation

TL;DR

MaskUno tackles the competing-kernels problem in multi-class instance segmentation by introducing a Switch-Split block that delegates each ROI to a dedicated, per-class mask head after refined bounding boxes. This decouples learning across classes and can be embedded into Mask-RCNN, Cascade Mask-RCNN, and HTC, yielding systematic mAP gains on COCO, including a 2.03% improvement for DetectoRS on 80 classes and up to 4.8% for Mask-RCNN. Losses are defined per class (, , and with per-class ) to ensure independent optimization of each mask head. The approach is versatile, demonstrated across multiple architectures, and holds potential for further gains with transformer-based backbones and cross-block refinements, advancing practical instance segmentation performance.

Abstract

Instance segmentation is an advanced form of image segmentation which, beyond traditional segmentation, requires identifying individual instances of repeating objects in a scene. Mask R-CNN is the most common architecture for instance segmentation, and improvements to this architecture include steps such as benefiting from bounding box refinements, adding semantics, or backbone enhancements. In all the proposed variations to date, the problem of competing kernels (each class aims to maximize its own accuracy) persists when models try to synchronously learn numerous classes. In this paper, we propose mitigating this problem by replacing mask prediction with a Switch-Split block that processes refined ROIs, classifies them, and assigns them to specialized mask predictors. We name the method MaskUno and test it on various models from the literature, which are then trained on multiple classes using the benchmark COCO dataset. An increase in the mean Average Precision (mAP) of 2.03% was observed for the high-performing DetectoRS when trained on 80 classes. MaskUno proved to enhance the mAP of instance segmentation models regardless of the number and typ
Paper Structure (20 sections, 3 equations, 5 figures, 1 table)

This paper contains 20 sections, 3 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Switch-Split block architecture
  • Figure 2: Switch-Split applied to Cascade Mask-RCNN
  • Figure 3: Switch-Split applied to Hybrid Task Cascade
  • Figure 4: Bar-graph showing the increase in the mAP before and after using MaskUno for Mask-RCNN and DetectoRS
  • Figure 5: Examples of various predictions