COCONut: Modernizing COCO Segmentation
Xueqing Deng, Qihang Yu, Peng Wang, Xiaohui Shen, Liang-Chieh Chen
TL;DR
This work addresses the limitations of COCO segmentation by introducing COCONut, a large-scale, human-verified universal segmentation dataset that harmonizes semantic, instance, and panoptic annotations across 133 classes. It presents an assisted-manual annotation pipeline and a data engine that scale from 118K to 358K training images, achieving 383K images and 5.18M masks, plus a high-quality 25K-image validation set. Analyses show improved annotation quality and benchmarking stability, while pseudo-labels offer limited gains compared to fully human-labeled data. The dataset enables more reliable evaluation and training for modern segmentation models, with clear evidence that larger, high-quality, human-annotated data enhances performance across tasks and backbones, and a more challenging COCONut-val improves model assessment.
Abstract
In recent decades, the vision community has witnessed remarkable progress in visual recognition, partially owing to advancements in dataset benchmarks. Notably, the established COCO benchmark has propelled the development of modern detection and segmentation systems. However, the COCO segmentation benchmark has seen comparatively slow improvement over the last decade. Originally equipped with coarse polygon annotations for thing instances, it gradually incorporated coarse superpixel annotations for stuff regions, which were subsequently heuristically amalgamated to yield panoptic segmentation annotations. These annotations, executed by different groups of raters, have resulted not only in coarse segmentation masks but also in inconsistencies between segmentation types. In this study, we undertake a comprehensive reevaluation of the COCO segmentation annotations. By enhancing the annotation quality and expanding the dataset to encompass 383K images with more than 5.18M panoptic masks, we introduce COCONut, the COCO Next Universal segmenTation dataset. COCONut harmonizes segmentation annotations across semantic, instance, and panoptic segmentation with meticulously crafted high-quality masks, and establishes a robust benchmark for all segmentation tasks. To our knowledge, COCONut stands as the inaugural large-scale universal segmentation dataset, verified by human raters. We anticipate that the release of COCONut will significantly contribute to the community's ability to assess the progress of novel neural networks.
