MaskUno: Switch-Split Block For Enhancing Instance Segmentation
Jawad Haidar, Marc Mouawad, Imad Elhajj, Daniel Asmar
TL;DR
MaskUno tackles the competing-kernels problem in multi-class instance segmentation by introducing a Switch-Split block that delegates each ROI to a dedicated, per-class mask head after refined bounding boxes. This decouples learning across classes and can be embedded into Mask-RCNN, Cascade Mask-RCNN, and HTC, yielding systematic mAP gains on COCO, including a 2.03% improvement for DetectoRS on 80 classes and up to 4.8% for Mask-RCNN. Losses are defined per class ($L_{cls}$, $L1$, and $L_{mask}$ with per-class $L_{mask_i}$) to ensure independent optimization of each mask head. The approach is versatile, demonstrated across multiple architectures, and holds potential for further gains with transformer-based backbones and cross-block refinements, advancing practical instance segmentation performance.
Abstract
Instance segmentation is an advanced form of image segmentation which, beyond traditional segmentation, requires identifying individual instances of repeating objects in a scene. Mask R-CNN is the most common architecture for instance segmentation, and improvements to this architecture include steps such as benefiting from bounding box refinements, adding semantics, or backbone enhancements. In all the proposed variations to date, the problem of competing kernels (each class aims to maximize its own accuracy) persists when models try to synchronously learn numerous classes. In this paper, we propose mitigating this problem by replacing mask prediction with a Switch-Split block that processes refined ROIs, classifies them, and assigns them to specialized mask predictors. We name the method MaskUno and test it on various models from the literature, which are then trained on multiple classes using the benchmark COCO dataset. An increase in the mean Average Precision (mAP) of 2.03% was observed for the high-performing DetectoRS when trained on 80 classes. MaskUno proved to enhance the mAP of instance segmentation models regardless of the number and typ
