The Berkeley Single Cell Computational Microscopy (BSCCM) Dataset
Henry Pinkard, Cherry Liu, Fanice Nyatigo, Daniel A. Fletcher, Laura Waller
TL;DR
BSCCM addresses reproducibility challenges in computational microscopy by providing a large, multi-modal single-cell dataset that links LED-array label-free imaging with fluorescence readouts and ground-truth cell-type labels. The approach combines physics-informed acquisition with data-driven analysis across multiple dataset variants (BSCCM, BSCCMNIST, BSCCM-coherent) and includes rich metadata and calibration to support robust benchmarking. Key contributions include large-scale, annotated ground-truth data across multiple illumination contrasts and a pipeline for fluorescence demixing and cross-modal alignment, enabling rigorous evaluation of reconstruction and phenotyping algorithms. This resource has practical biomedical impact by accelerating the development of cost-effective, robust computational microscopy tools for clinical and research applications.
Abstract
Computational microscopy, in which hardware and algorithms of an imaging system are jointly designed, shows promise for making imaging systems that cost less, perform more robustly, and collect new types of information. Often, the performance of computational imaging systems, especially those that incorporate machine learning, is sample-dependent. Thus, standardized datasets are an essential tool for comparing the performance of different approaches. Here, we introduce the Berkeley Single Cell Computational Microscopy (BSCCM) dataset, which contains over ~12,000,000 images of 400,000 of individual white blood cells. The dataset contains images captured with multiple illumination patterns on an LED array microscope and fluorescent measurements of the abundance of surface proteins that mark different cell types. We hope this dataset will provide a valuable resource for the development and testing of new algorithms in computational microscopy and computer vision with practical biomedical applications.
