CoBooM: Codebook Guided Bootstrapping for Medical Image Representation Learning
Azad Singh, Deepak Mishra
TL;DR
CoBooM addresses the gap in self-supervised medical image learning by leveraging a codebook to capture anatomical similarities. It combines continuous context/target encodings with a discrete codebook through a Quantizer and fuses them using DiversiFuse cross-attention in a BYOL-like framework. The approach outperforms multiple SSL baselines on chest X-ray and fundus datasets in both linear probing and semi-supervised settings, with notable gains in classification and segmentation and minimal need for backbone fine-tuning. This provides a scalable, resource-efficient method for learning transferable medical image representations from unlabeled data.
Abstract
Self-supervised learning (SSL) has emerged as a promising paradigm for medical image analysis by harnessing unannotated data. Despite their potential, the existing SSL approaches overlook the high anatomical similarity inherent in medical images. This makes it challenging for SSL methods to capture diverse semantic content in medical images consistently. This work introduces a novel and generalized solution that implicitly exploits anatomical similarities by integrating codebooks in SSL. The codebook serves as a concise and informative dictionary of visual patterns, which not only aids in capturing nuanced anatomical details but also facilitates the creation of robust and generalized feature representations. In this context, we propose CoBooM, a novel framework for self-supervised medical image learning by integrating continuous and discrete representations. The continuous component ensures the preservation of fine-grained details, while the discrete aspect facilitates coarse-grained feature extraction through the structured embedding space. To understand the effectiveness of CoBooM, we conduct a comprehensive evaluation of various medical datasets encompassing chest X-rays and fundus images. The experimental results reveal a significant performance gain in classification and segmentation tasks.
