Table of Contents
Fetching ...

Preserving Marker Specificity with Lightweight Channel-Independent Representation Learning

Simon Gutwein, Arthur Longuefosse, Jun Seita, Sabine Taschner-Mandl, Roxane Licandro

TL;DR

The paper argues that preserving marker independence through a shallow, channel-independent architecture yields superior representations for multiplex imaging compared with traditional early-fusion models that scale up depth or width. It introduces CIM and the compact CIM-S variant, showing strong performance in both supervised and self-supervised settings on CODEX data and demonstrating robust, interpretable, segmentation-free phenotyping via LRP. Across experiments, architectural inductive bias proves more impactful than model size for marker-specific information retention and rare-cell discrimination, even with reduced marker panels. The findings advocate for architecture-centric benchmarking in multiplex imaging and enable efficient, interpretable phenotyping workflows with potential clinical relevance.

Abstract

Multiplexed tissue imaging measures dozens of protein markers per cell, yet most deep learning models still apply early channel fusion, assuming shared structure across markers. We investigate whether preserving marker independence, combined with deliberately shallow architectures, provides a more suitable inductive bias for self-supervised representation learning in multiplex data than increasing model scale. Using a Hodgkin lymphoma CODEX dataset with 145,000 cells and 49 markers, we compare standard early-fusion CNNs with channel-separated architectures, including a marker-aware baseline and our novel shallow Channel-Independent Model (CIM-S) with 5.5K parameters. After contrastive pretraining and linear evaluation, early-fusion models show limited ability to retain marker-specific information and struggle particularly with rare-cell discrimination. Channel-independent architectures, and CIM-S in particular, achieve substantially stronger representations despite their compact size. These findings are consistent across multiple self-supervised frameworks, remain stable across augmentation settings, and are reproducible across both the 49-marker and reduced 18-marker settings. These results show that lightweight, channel-independent architectures can match or surpass deep early-fusion CNNs and foundation models for multiplex representation learning. Code is available at https://github.com/SimonBon/CIM-S.

Preserving Marker Specificity with Lightweight Channel-Independent Representation Learning

TL;DR

The paper argues that preserving marker independence through a shallow, channel-independent architecture yields superior representations for multiplex imaging compared with traditional early-fusion models that scale up depth or width. It introduces CIM and the compact CIM-S variant, showing strong performance in both supervised and self-supervised settings on CODEX data and demonstrating robust, interpretable, segmentation-free phenotyping via LRP. Across experiments, architectural inductive bias proves more impactful than model size for marker-specific information retention and rare-cell discrimination, even with reduced marker panels. The findings advocate for architecture-centric benchmarking in multiplex imaging and enable efficient, interpretable phenotyping workflows with potential clinical relevance.

Abstract

Multiplexed tissue imaging measures dozens of protein markers per cell, yet most deep learning models still apply early channel fusion, assuming shared structure across markers. We investigate whether preserving marker independence, combined with deliberately shallow architectures, provides a more suitable inductive bias for self-supervised representation learning in multiplex data than increasing model scale. Using a Hodgkin lymphoma CODEX dataset with 145,000 cells and 49 markers, we compare standard early-fusion CNNs with channel-separated architectures, including a marker-aware baseline and our novel shallow Channel-Independent Model (CIM-S) with 5.5K parameters. After contrastive pretraining and linear evaluation, early-fusion models show limited ability to retain marker-specific information and struggle particularly with rare-cell discrimination. Channel-independent architectures, and CIM-S in particular, achieve substantially stronger representations despite their compact size. These findings are consistent across multiple self-supervised frameworks, remain stable across augmentation settings, and are reproducible across both the 49-marker and reduced 18-marker settings. These results show that lightweight, channel-independent architectures can match or surpass deep early-fusion CNNs and foundation models for multiplex representation learning. Code is available at https://github.com/SimonBon/CIM-S.

Paper Structure

This paper contains 14 sections, 1 equation, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Self-supervised learning pipeline for multiplex single-cell representation learning. Left: Multi-view augmentations (V1, V2) applied to raw patches. Middle: CIM feature extraction with channel-wise grouped convolutions ($G=C$), BN: BatchNom. Right: Contrastive loss driving similarity for positive pairs and dissimilarity for negative pairs.
  • Figure 2: Benchmarking representation quality. A: supervised training on all 49 markers. B: self-supervised training on all 49 markers with linear evaluation. C: self-supervised training on a reduced 18-marker panel, including comparison with the KRONOS foundation model. Bubble size reflects model parameter count; bubble color represents accuracy in orange and balanced accuracy in green.
  • Figure 3: Label-free phenotyping workflow. Raw multiplex images are processed to extract cell-centered patches. A frozen, self-supervised CIM-S encoder computes relevance scores via LRP. These scores are aggregated into marker modules to assign phenotypes without supervised training.
  • Figure 4: UMAP of the CIM-S feature space for a subset of 20,000 cells from the cHL dataset. (A) Cells colored by the assigned cell type based on the maximum module score. (B) Colored by channel-wise normalized relevance scores for CD20, CD30, CD11b, and CD7. (C) Colored by module score values for the B cell, tumor, myeloid, and endothelial modules.
  • Figure 5: Phenotyping Results.A: Whole-slide annotations and representative local crops illustrating alignment between CIM-S predictions and biological structures (B cells/CD20, T cells/CD4, tumor/CD30, vessels/CD31). B: CD20 vs. CD4 distributions for CD4+ T cells and B cells, comparing CIM-S LRP relevance scores with raw expression signals from the provided labels. WD: Wasserstein Distance C: Cell-type proportion comparison highlighting the shift in T/B cell ratios. E: Local discordance map between CIM-S (left) and baseline labels. F: Raw signal comparison showing diffuse CD4 background versus sharply localized CD20 signal.
  • ...and 4 more figures