C3R: Channel Conditioned Cell Representations for unified evaluation in microscopy imaging
Umar Marikkar, Syed Sameed Husain, Muhammad Awais, Sara Atito
TL;DR
C3R tackles the challenge of varying channel configurations in immunohistochemical imaging by introducing a context–concept split and a channel-conditioned encoder. The two-part framework combines a Context-Concept Encoder (CCE) with Masked Context Distillation (MCD) to produce transferable cell representations that enable zero-shot OOD evaluation across datasets like HPA, JUMP-CP, and CHAMMI, without dataset-specific retraining. Ablation studies show significant gains from the branched, grouped-stem architecture and the distillation strategy, highlighting the importance of learning distinct context and concept features while using context as a reference for the concept. Overall, C3R demonstrates improved ID and competitive OOD performance, opening a path to cross-dataset generalization in IHC imaging and reducing the need for dataset-specific adaptation in clinical and research settings.
Abstract
Immunohistochemical (IHC) images reveal detailed information about structures and functions at the subcellular level. However, unlike natural images, IHC datasets pose challenges for deep learning models due to their inconsistencies in channel count and configuration, stemming from varying staining protocols across laboratories and studies. Existing approaches build channel-adaptive models, which unfortunately fail to support out-of-distribution (OOD) evaluation across IHC datasets and cannot be applied in a true zero-shot setting with mismatched channel counts. To address this, we introduce a structured view of cellular image channels by grouping them into either context or concept, where we treat the context channels as a reference to the concept channels in the image. We leverage this context-concept principle to develop Channel Conditioned Cell Representations (C3R), a framework designed for unified evaluation on in-distribution (ID) and OOD datasets. C3R is a two-fold framework comprising a channel-adaptive encoder architecture and a masked knowledge distillation training strategy, both built around the context-concept principle. We find that C3R outperforms existing benchmarks on both ID and OOD tasks, while a trivial implementation of our core idea also outperforms the channel-adaptive methods reported on the CHAMMI benchmark. Our method opens a new pathway for cross-dataset generalization between IHC datasets, without requiring dataset-specific adaptation or retraining.
