Glioma C6: A Novel Dataset for Training and Benchmarking Cell Segmentation
Roman Malashin, Svetlana Pashkevich, Daniil Ilyukhin, Arseniy Volkov, Valeria Yachnaya, Andrey Denisov, Maria Mikhalkova
TL;DR
The paper introduces Glioma C6, a specialized phase-contrast microscopy dataset for glioma C6 cells with soma and two cell-type labels, designed for benchmarking and training instance segmentation models. It demonstrates that generalist segmentation approaches struggle to generalize to this dataset without fine-tuning, while targeted fine-tuning yields robust performance across varied imaging conditions, using a 75-image, $12{,}000$+ cell corpus split into spec and gen subsets. It also analyzes annotation uncertainty, showing inherent ambiguity in dense, overlapping morphologies and highlighting that model predictions can sometimes align with expert consensus better than original ground truth. Overall, Glioma C6 provides a realistic, challenging benchmark to advance robust cell segmentation and phenotyping in dense tumor-like environments and supports model adaptation studies.
Abstract
We present Glioma C6, a new open dataset for instance segmentation of glioma C6 cells, designed as both a benchmark and a training resource for deep learning models. The dataset comprises 75 high-resolution phase-contrast microscopy images with over 12,000 annotated cells, providing a realistic testbed for biomedical image analysis. It includes soma annotations and morphological cell categorization provided by biologists. Additional categorization of cells, based on morphology, aims to enhance the utilization of image data for cancer cell research. Glioma C6 consists of two parts: the first is curated with controlled parameters for benchmarking, while the second supports generalization testing under varying conditions. We evaluate the performance of several generalist segmentation models, highlighting their limitations on our dataset. Our experiments demonstrate that training on Glioma C6 significantly enhances segmentation performance, reinforcing its value for developing robust and generalizable models. The dataset is publicly available for researchers.
