Table of Contents
Fetching ...

Open-world Instance Segmentation: Top-down Learning with Bottom-up Supervision

Tarun Kalluri, Weiyao Wang, Heng Wang, Manmohan Chandraker, Lorenzo Torresani, Du Tran

TL;DR

The paper tackles open-world instance segmentation by addressing the bias of closed-world, top-down models toward seen categories. It introduces UDOS, a unified framework that learns a top-down part-mask predictor under weak supervision from class-agnostic bottom-up segmentations, then merges parts with an affinity-based grouping and refines the results with a RoIHeads-like refinement head. Across cross-category and cross-dataset benchmarks, UDOS outperforms state-of-the-art methods, establishing new baselines on COCO, UVO, ADE20K, OpenImages, and LVIS. The approach demonstrates strong generalization to unseen categories and offers a practical, efficient solution with potential for further gains by integrating recent open-world tools like SAM.

Abstract

Many top-down architectures for instance segmentation achieve significant success when trained and tested on pre-defined closed-world taxonomy. However, when deployed in the open world, they exhibit notable bias towards seen classes and suffer from significant performance drop. In this work, we propose a novel approach for open world instance segmentation called bottom-Up and top-Down Open-world Segmentation (UDOS) that combines classical bottom-up segmentation algorithms within a top-down learning framework. UDOS first predicts parts of objects using a top-down network trained with weak supervision from bottom-up segmentations. The bottom-up segmentations are class-agnostic and do not overfit to specific taxonomies. The part-masks are then fed into affinity-based grouping and refinement modules to predict robust instance-level segmentations. UDOS enjoys both the speed and efficiency from the top-down architectures and the generalization ability to unseen categories from bottom-up supervision. We validate the strengths of UDOS on multiple cross-category as well as cross-dataset transfer tasks from 5 challenging datasets including MS-COCO, LVIS, ADE20k, UVO and OpenImages, achieving significant improvements over state-of-the-art across the board. Our code and models are available on our project page.

Open-world Instance Segmentation: Top-down Learning with Bottom-up Supervision

TL;DR

The paper tackles open-world instance segmentation by addressing the bias of closed-world, top-down models toward seen categories. It introduces UDOS, a unified framework that learns a top-down part-mask predictor under weak supervision from class-agnostic bottom-up segmentations, then merges parts with an affinity-based grouping and refines the results with a RoIHeads-like refinement head. Across cross-category and cross-dataset benchmarks, UDOS outperforms state-of-the-art methods, establishing new baselines on COCO, UVO, ADE20K, OpenImages, and LVIS. The approach demonstrates strong generalization to unseen categories and offers a practical, efficient solution with potential for further gains by integrating recent open-world tools like SAM.

Abstract

Many top-down architectures for instance segmentation achieve significant success when trained and tested on pre-defined closed-world taxonomy. However, when deployed in the open world, they exhibit notable bias towards seen classes and suffer from significant performance drop. In this work, we propose a novel approach for open world instance segmentation called bottom-Up and top-Down Open-world Segmentation (UDOS) that combines classical bottom-up segmentation algorithms within a top-down learning framework. UDOS first predicts parts of objects using a top-down network trained with weak supervision from bottom-up segmentations. The bottom-up segmentations are class-agnostic and do not overfit to specific taxonomies. The part-masks are then fed into affinity-based grouping and refinement modules to predict robust instance-level segmentations. UDOS enjoys both the speed and efficiency from the top-down architectures and the generalization ability to unseen categories from bottom-up supervision. We validate the strengths of UDOS on multiple cross-category as well as cross-dataset transfer tasks from 5 challenging datasets including MS-COCO, LVIS, ADE20k, UVO and OpenImages, achieving significant improvements over state-of-the-art across the board. Our code and models are available on our project page.
Paper Structure (15 sections, 3 equations, 11 figures, 5 tables)

This paper contains 15 sections, 3 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Open world segmentation using UDOS. Image from COCO. (a) Mask R-CNN he2017mask, trained on VOC-categories from COCO, fails to detect many unseen categories due to seen-class bias; (b) MCG pont2016multiscale provides diverse proposals, but predicts many over-segmented false-positives with noisy boundaries; (c) combining the advantages of (a) and (b) into a joint framework, UDOS efficiently detects unseen classes in open world when trained only using VOC-categories from COCO, while adding negligible inference time overhead.
  • Figure 2: UDOS overview
  • Figure 3: Proposed UDOS pipeline
  • Figure 4: Grouping module. (a) the bounding boxes $b_i$ of the predicted part-masks are expanded to incorporate local context. (b) The features $f_{b,i}$ are extracted using RoIAlign operator on the FPN features $\mathcal{F}$ with the expanded bounding boxes $b_i'$, and are used to compute pairwise affinity $\phi(b_i,b_j)$ using cosine similarity. (c) A clustering algorithm is used to group parts into whole object instances, as shown in (d). Note that the inaccuracies in the output from grouping module are later corrected by the refinement module.
  • Figure 5: Visualization of segmentations for model trained only on VOC classes from COCO dataset. The top row shows result using using Mask-RCNN$_{SC}$, second row shows output using UDOS and the third row shows some predictions made only by UDOS and missed by Mask-RCNN$_{SC}$ . We also show the number of detections made by the network below each image. Starting from left most image, many classes like {jug, tissue papers, tie, eyeglasses}, {knife, cutting board, vegetables, glass}, {shoes, helmet, gloves}, {ostrich} and {dishwasher, faucet} among others which are not part of VOC-classes are missed by standard Mask-RCNN training, but detected using UDOS. More visualizations are provided in the supplementary.
  • ...and 6 more figures