Self-Calibrated Dual Contrasting for Annotation-Efficient Bacteria Raman Spectroscopy Clustering and Classification
Haiming Yao, Wei Luo, Tao Zhou, Ang Gao, Xue Wang
TL;DR
The paper tackles the annotation bottleneck in bacteria identification from Raman spectra by proposing Self-Calibrated Dual Contrasting (SCDC), a framework that learns from both labeled and unlabeled data using dual contrastive objectives in embedding and category spaces, augmented spectroscopy views, and a self-calibration loop. SCDC demonstrates robust performance under low labeling across three large-scale datasets, outperforming multiple unsupervised and semi-supervised baselines and approaching fully supervised results with only a fraction of annotations. Key contributions include the embedding- and category-level contrasting mechanisms, a self-calibration strategy using pseudo-labels, and a thorough ablation study that highlights the roles of dual-contrast, augmentation strategies, and hyper-parameter settings. The work is significant for enabling practical, annotation-efficient biospectral identification with potential clinical impact by reducing expert labeling requirements and enabling scalable pathogen detection from Raman spectra.
Abstract
Raman scattering is based on molecular vibration spectroscopy and provides a powerful technology for pathogenic bacteria diagnosis using the unique molecular fingerprint information of a substance. The integration of deep learning technology has significantly improved the efficiency and accuracy of intelligent Raman spectroscopy (RS) recognition. However, the current RS recognition methods based on deep neural networks still require the annotation of a large amount of spectral data, which is labor-intensive. This paper presents a novel annotation-efficient Self-Calibrated Dual Contrasting (SCDC) method for RS recognition that operates effectively with few or no annotation. Our core motivation is to represent the spectrum from two different perspectives in two distinct subspaces: embedding and category. The embedding perspective captures instance-level information, while the category perspective reflects category-level information. Accordingly, we have implemented a dual contrastive learning approach from two perspectives to obtain discriminative representations, which are applicable for Raman spectroscopy recognition under both unsupervised and semi-supervised learning conditions. Furthermore, a self-calibration mechanism is proposed to enhance robustness. Validation of the identification task on three large-scale bacterial Raman spectroscopy datasets demonstrates that our SCDC method achieves robust recognition performance with very few (5$\%$ or 10$\%$) or no annotations, highlighting the potential of the proposed method for biospectral identification in annotation-efficient clinical scenarios.
