Divide and Conquer Self-Supervised Learning for High-Content Imaging
Lucas Farndale, Paul Henderson, Edward W Roberts, Ke Yuan
TL;DR
This work addresses the challenge that self-supervised learning often prioritizes simple, high-variance features over more subtle, complex patterns in high-content imaging. It introduces SpliCER, a divide-and-conquer training architecture that deconstructs images into components, learns component-specific embeddings, and attaches them to chunks of a primary embedding, using a per-chunk information-theoretic objective to encourage learning of subtle features without suppressing simple ones. Across MNIST-CIFAR, spatial proteomics, and hyperspectral geodata, SpliCER improves downstream performance on tasks requiring complex features and shows benefits from leveraging segmentation masks to reveal informative background information. The approach integrates with existing SSL losses, supports various domains, and offers practical insights for multiplex imaging and segmentation-guided learning, with limitations related to downstream classifier shortcuts and dependence on image deconstruction quality. Overall, SpliCER provides a principled divide-and-conquer strategy to mitigate simplicity bias and unlock richer representations for scientific and engineering imaging applications.
Abstract
Self-supervised representation learning methods often fail to learn subtle or complex features, which can be dominated by simpler patterns which are much easier to learn. This limitation is particularly problematic in applications to science and engineering, as complex features can be critical for discovery and analysis. To address this, we introduce Split Component Embedding Registration (SpliCER), a novel architecture which splits the image into sections and distils information from each section to guide the model to learn more subtle and complex features without compromising on simpler features. SpliCER is compatible with any self-supervised loss function and can be integrated into existing methods without modification. The primary contributions of this work are as follows: i) we demonstrate that existing self-supervised methods can learn shortcut solutions when simple and complex features are both present; ii) we introduce a novel self-supervised training method, SpliCER, to overcome the limitations of existing methods, and achieve significant downstream performance improvements; iii) we demonstrate the effectiveness of SpliCER in cutting-edge medical and geospatial imaging settings. SpliCER offers a powerful new tool for representation learning, enabling models to uncover complex features which could be overlooked by other methods.
