MICM: Rethinking Unsupervised Pretraining for Enhanced Few-shot Learning
Zhenyu Zhang, Guangyao Chen, Yixiong Zou, Zhimeng Huang, Yuhua Li, Ruixuan Li
TL;DR
The paper addresses the limitations of unsupervised pretraining for few-shot learning by quantitatively analyzing contrastive learning and masked image modeling, then proposing Masked Image Contrastive Modeling (MICM) to blend their strengths. MICM uses an encoder-decoder architecture with a delayed class token, teacher-student self-distillation, and a hybrid loss that combines image reconstruction with contrastive objectives, within a two-stage U-FSL framework that includes OpTA and pseudo-label enhancements. Across MiniImageNet, TieredImageNet, CIFAR-FS, and cross-domain datasets, MICM achieves state-of-the-art results by improving both generalization to novel classes and discriminability of representations, while remaining adaptable to various FSL strategies. The work advances unsupervised pretraining for U-FSL by delivering a versatile, robust method that reduces reliance on labeled data and enhances transferability across diverse tasks.
Abstract
Humans exhibit a remarkable ability to learn quickly from a limited number of labeled samples, a capability that starkly contrasts with that of current machine learning systems. Unsupervised Few-Shot Learning (U-FSL) seeks to bridge this divide by reducing reliance on annotated datasets during initial training phases. In this work, we first quantitatively assess the impacts of Masked Image Modeling (MIM) and Contrastive Learning (CL) on few-shot learning tasks. Our findings highlight the respective limitations of MIM and CL in terms of discriminative and generalization abilities, which contribute to their underperformance in U-FSL contexts. To address these trade-offs between generalization and discriminability in unsupervised pretraining, we introduce a novel paradigm named Masked Image Contrastive Modeling (MICM). MICM creatively combines the targeted object learning strength of CL with the generalized visual feature learning capability of MIM, significantly enhancing its efficacy in downstream few-shot learning inference. Extensive experimental analyses confirm the advantages of MICM, demonstrating significant improvements in both generalization and discrimination capabilities for few-shot learning. Our comprehensive quantitative evaluations further substantiate the superiority of MICM, showing that our two-stage U-FSL framework based on MICM markedly outperforms existing leading baselines.
