Table of Contents
Fetching ...

C3R: Channel Conditioned Cell Representations for unified evaluation in microscopy imaging

Umar Marikkar, Syed Sameed Husain, Muhammad Awais, Sara Atito

TL;DR

C3R tackles the challenge of varying channel configurations in immunohistochemical imaging by introducing a context–concept split and a channel-conditioned encoder. The two-part framework combines a Context-Concept Encoder (CCE) with Masked Context Distillation (MCD) to produce transferable cell representations that enable zero-shot OOD evaluation across datasets like HPA, JUMP-CP, and CHAMMI, without dataset-specific retraining. Ablation studies show significant gains from the branched, grouped-stem architecture and the distillation strategy, highlighting the importance of learning distinct context and concept features while using context as a reference for the concept. Overall, C3R demonstrates improved ID and competitive OOD performance, opening a path to cross-dataset generalization in IHC imaging and reducing the need for dataset-specific adaptation in clinical and research settings.

Abstract

Immunohistochemical (IHC) images reveal detailed information about structures and functions at the subcellular level. However, unlike natural images, IHC datasets pose challenges for deep learning models due to their inconsistencies in channel count and configuration, stemming from varying staining protocols across laboratories and studies. Existing approaches build channel-adaptive models, which unfortunately fail to support out-of-distribution (OOD) evaluation across IHC datasets and cannot be applied in a true zero-shot setting with mismatched channel counts. To address this, we introduce a structured view of cellular image channels by grouping them into either context or concept, where we treat the context channels as a reference to the concept channels in the image. We leverage this context-concept principle to develop Channel Conditioned Cell Representations (C3R), a framework designed for unified evaluation on in-distribution (ID) and OOD datasets. C3R is a two-fold framework comprising a channel-adaptive encoder architecture and a masked knowledge distillation training strategy, both built around the context-concept principle. We find that C3R outperforms existing benchmarks on both ID and OOD tasks, while a trivial implementation of our core idea also outperforms the channel-adaptive methods reported on the CHAMMI benchmark. Our method opens a new pathway for cross-dataset generalization between IHC datasets, without requiring dataset-specific adaptation or retraining.

C3R: Channel Conditioned Cell Representations for unified evaluation in microscopy imaging

TL;DR

C3R tackles the challenge of varying channel configurations in immunohistochemical imaging by introducing a context–concept split and a channel-conditioned encoder. The two-part framework combines a Context-Concept Encoder (CCE) with Masked Context Distillation (MCD) to produce transferable cell representations that enable zero-shot OOD evaluation across datasets like HPA, JUMP-CP, and CHAMMI, without dataset-specific retraining. Ablation studies show significant gains from the branched, grouped-stem architecture and the distillation strategy, highlighting the importance of learning distinct context and concept features while using context as a reference for the concept. Overall, C3R demonstrates improved ID and competitive OOD performance, opening a path to cross-dataset generalization in IHC imaging and reducing the need for dataset-specific adaptation in clinical and research settings.

Abstract

Immunohistochemical (IHC) images reveal detailed information about structures and functions at the subcellular level. However, unlike natural images, IHC datasets pose challenges for deep learning models due to their inconsistencies in channel count and configuration, stemming from varying staining protocols across laboratories and studies. Existing approaches build channel-adaptive models, which unfortunately fail to support out-of-distribution (OOD) evaluation across IHC datasets and cannot be applied in a true zero-shot setting with mismatched channel counts. To address this, we introduce a structured view of cellular image channels by grouping them into either context or concept, where we treat the context channels as a reference to the concept channels in the image. We leverage this context-concept principle to develop Channel Conditioned Cell Representations (C3R), a framework designed for unified evaluation on in-distribution (ID) and OOD datasets. C3R is a two-fold framework comprising a channel-adaptive encoder architecture and a masked knowledge distillation training strategy, both built around the context-concept principle. We find that C3R outperforms existing benchmarks on both ID and OOD tasks, while a trivial implementation of our core idea also outperforms the channel-adaptive methods reported on the CHAMMI benchmark. Our method opens a new pathway for cross-dataset generalization between IHC datasets, without requiring dataset-specific adaptation or retraining.

Paper Structure

This paper contains 39 sections, 11 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: The intrinsic separation of channels. Context channels serve as structural references and tend to exhibit high visual consistency across cells and datasets. Concept channels capture more variable, experiment-specific phenotypes, and exhibit greater diversity across instances.
  • Figure 2: Overview of C3R. (a) Context-Concept Encoder: The input channels are separated into context and concept, where each group is processed independently through their respective $h_c$ and $f_c$ layers. The two group-wise representations are then combined and passed through a shared set of encoder layers $f_s$. (b) Masked Context Distillation: During training, the student encoder $\mathcal{S}$ randomly samples a subset of context channels prior to the forward pass, while the teacher encoder $\mathcal{T}$ passes the full set of context channels. The loss is computed between the context-masked student representation and the dense teacher representation.
  • Figure 3: Effects of group switching assignments. The experiments were carried out for JUMP-CP using ViT-S without MCD and $d$ layers per branch.
  • Figure 4: Visualization of channel-wise features in 2D space using UMAP on (a) HPA and (b) JUMP-CP. Most individual channels show clear inter-channel separation and intra-channel similarity between instances.
  • Figure 5: Comparison of performance impact from (a) pre-merging and (b) post-merging depths on HPA (top) and JUMP-CP (bottom). All experiments are performed on the CCE architecture with ViT-S and without Masked Context Distillation.
  • ...and 2 more figures