Table of Contents
Fetching ...

Deep Learning for BioImaging: What Are We Learning?

Ivan Svatko, Maxime Sanchez, Ihab Bendidi, Gilles Cottrell, Auguste Genovesio

Abstract

Representation learning has driven major advances in natural image analysis by enabling models to acquire high-level semantic features. In microscopy imaging, however, it remains unclear what current representation learning methods actually learn. In this work, we conduct a systematic study of representation learning for the two most widely used and broadly available microscopy data types, representing critical scales in biology: cell culture and tissue imaging. To this end, we introduce a set of simple yet revealing baselines on curated benchmarks, including untrained models and simple structural representations of cellular tissue. Our results show that, surprisingly, state-of-the-art methods perform comparably to these baselines. We further show that, in contrast to natural images, existing models fail to consistently acquire high-level, biologically meaningful features. Moreover, we demonstrate that commonly used benchmark metrics are insufficient to assess representation quality and often mask this limitation. In addition, we investigate how detailed comparisons with these benchmarks provide ways to interpret the strengths and weaknesses of models for further improvements. Together, our results suggest that progress in microscopy image representation learning requires not only stronger models, but also more diagnostic benchmarks that measure what is actually learned.

Deep Learning for BioImaging: What Are We Learning?

Abstract

Representation learning has driven major advances in natural image analysis by enabling models to acquire high-level semantic features. In microscopy imaging, however, it remains unclear what current representation learning methods actually learn. In this work, we conduct a systematic study of representation learning for the two most widely used and broadly available microscopy data types, representing critical scales in biology: cell culture and tissue imaging. To this end, we introduce a set of simple yet revealing baselines on curated benchmarks, including untrained models and simple structural representations of cellular tissue. Our results show that, surprisingly, state-of-the-art methods perform comparably to these baselines. We further show that, in contrast to natural images, existing models fail to consistently acquire high-level, biologically meaningful features. Moreover, we demonstrate that commonly used benchmark metrics are insufficient to assess representation quality and often mask this limitation. In addition, we investigate how detailed comparisons with these benchmarks provide ways to interpret the strengths and weaknesses of models for further improvements. Together, our results suggest that progress in microscopy image representation learning requires not only stronger models, but also more diagnostic benchmarks that measure what is actually learned.
Paper Structure (75 sections, 3 equations, 25 figures, 11 tables)

This paper contains 75 sections, 3 equations, 25 figures, 11 tables.

Figures (25)

  • Figure 1: Comparison between trained and untrained models. (a) Task-dependent metrics on natural images and Cell Painting data. (b) Correlations between predictions on Cell Painting data. Large scale subfigures are available in \ref{['appendix:hires_figs']}.
  • Figure 2: Comparison of performance of intermediate layers. Minimal/ maximal scores are reported. (a) Classification on ImageNet-1k (in blue) strongly favors deeper layers of pretrained models unlike the relationship retrieval tasks on RxRx3-core (in red). (b) Replicate retrieval for a subset of JUMP-CP compounds grouped by architecture.
  • Figure 3: Mean average precision (mAP) per compound on the JUMP-CP benchmark. Bar plots show mAP scores for eight positive-control compounds and the mean across compounds. Each bar corresponds to a model configuration, grouped and colored by architecture family. Solid bars denote pretrained models, hatched bars denote untrained models, and pixel-based baselines are included for reference. Error bars indicate variability across evaluation folds. An upscaled version is available in \ref{['appendix:hires_figs']}.
  • Figure 4: Analysis of results on HEST-1k-1NN. (a) Comparison between the best structure-only model and H-Optimus-1. (b) Performance analysis of intermediate layers.
  • Figure 5: Samples from 8 classes sampled for the mAP experiment.
  • ...and 20 more figures