Preserving Marker Specificity with Lightweight Channel-Independent Representation Learning
Simon Gutwein, Arthur Longuefosse, Jun Seita, Sabine Taschner-Mandl, Roxane Licandro
TL;DR
The paper argues that preserving marker independence through a shallow, channel-independent architecture yields superior representations for multiplex imaging compared with traditional early-fusion models that scale up depth or width. It introduces CIM and the compact CIM-S variant, showing strong performance in both supervised and self-supervised settings on CODEX data and demonstrating robust, interpretable, segmentation-free phenotyping via LRP. Across experiments, architectural inductive bias proves more impactful than model size for marker-specific information retention and rare-cell discrimination, even with reduced marker panels. The findings advocate for architecture-centric benchmarking in multiplex imaging and enable efficient, interpretable phenotyping workflows with potential clinical relevance.
Abstract
Multiplexed tissue imaging measures dozens of protein markers per cell, yet most deep learning models still apply early channel fusion, assuming shared structure across markers. We investigate whether preserving marker independence, combined with deliberately shallow architectures, provides a more suitable inductive bias for self-supervised representation learning in multiplex data than increasing model scale. Using a Hodgkin lymphoma CODEX dataset with 145,000 cells and 49 markers, we compare standard early-fusion CNNs with channel-separated architectures, including a marker-aware baseline and our novel shallow Channel-Independent Model (CIM-S) with 5.5K parameters. After contrastive pretraining and linear evaluation, early-fusion models show limited ability to retain marker-specific information and struggle particularly with rare-cell discrimination. Channel-independent architectures, and CIM-S in particular, achieve substantially stronger representations despite their compact size. These findings are consistent across multiple self-supervised frameworks, remain stable across augmentation settings, and are reproducible across both the 49-marker and reduced 18-marker settings. These results show that lightweight, channel-independent architectures can match or surpass deep early-fusion CNNs and foundation models for multiplex representation learning. Code is available at https://github.com/SimonBon/CIM-S.
