Table of Contents
Fetching ...

The Devil is in the Few Shots: Iterative Visual Knowledge Completion for Few-shot Learning

Yaohui Li, Qifeng Zhou, Haoxing Chen, Jianbing Zhang, Xinyu Dai, Hao Zhou

TL;DR

CLIP-based few-shot learning is hampered by biased visual knowledge arising from the narrow distribution of scarce examples. The paper introduces Iterative Visual Knowledge Completion (KCL), a training-free, plug-and-play module that uses a mutual-nearest-neighbor confidence rule to iteratively incorporate high-confidence unlabeled test samples into the few-shot support, refining class centers until convergence. Empirically, KCL delivers substantial gains across 11 datasets in both few-shot and zero-shot settings, outperforming diffusion- and database-based knowledge completion methods while maintaining efficiency. Ablation studies show KCL is robust to hyperparameters, with small unlabeled budgets per class and a multi-modal similarity strategy yielding best results. The approach offers a practical means to mitigate data scarcity in CLIP transfers, with potential extensions to open-set and continual learning scenarios.

Abstract

Contrastive Language-Image Pre-training (CLIP) has shown powerful zero-shot learning performance. Few-shot learning aims to further enhance the transfer capability of CLIP by giving few images in each class, aka 'few shots'. Most existing methods either implicitly learn from the few shots by incorporating learnable prompts or adapters, or explicitly embed them in a cache model for inference. However, the narrow distribution of few shots often contains incomplete class information, leading to biased visual knowledge with high risk of misclassification. To tackle this problem, recent methods propose to supplement visual knowledge by generative models or extra databases, which can be costly and time-consuming. In this paper, we propose an Iterative Visual Knowledge CompLetion (KCL) method to complement visual knowledge by properly taking advantages of unlabeled samples without access to any auxiliary or synthetic data. Specifically, KCL first measures the similarities between unlabeled samples and each category. Then, the samples with top confidence to each category is selected and collected by a designed confidence criterion. Finally, the collected samples are treated as labeled ones and added to few shots to jointly re-estimate the remaining unlabeled ones. The above procedures will be repeated for a certain number of iterations with more and more samples being collected until convergence, ensuring a progressive and robust knowledge completion process. Extensive experiments on 11 benchmark datasets demonstrate the effectiveness and efficiency of KCL as a plug-and-play module under both few-shot and zero-shot learning settings. Code is available at https://github.com/Mark-Sky/KCL.

The Devil is in the Few Shots: Iterative Visual Knowledge Completion for Few-shot Learning

TL;DR

CLIP-based few-shot learning is hampered by biased visual knowledge arising from the narrow distribution of scarce examples. The paper introduces Iterative Visual Knowledge Completion (KCL), a training-free, plug-and-play module that uses a mutual-nearest-neighbor confidence rule to iteratively incorporate high-confidence unlabeled test samples into the few-shot support, refining class centers until convergence. Empirically, KCL delivers substantial gains across 11 datasets in both few-shot and zero-shot settings, outperforming diffusion- and database-based knowledge completion methods while maintaining efficiency. Ablation studies show KCL is robust to hyperparameters, with small unlabeled budgets per class and a multi-modal similarity strategy yielding best results. The approach offers a practical means to mitigate data scarcity in CLIP transfers, with potential extensions to open-set and continual learning scenarios.

Abstract

Contrastive Language-Image Pre-training (CLIP) has shown powerful zero-shot learning performance. Few-shot learning aims to further enhance the transfer capability of CLIP by giving few images in each class, aka 'few shots'. Most existing methods either implicitly learn from the few shots by incorporating learnable prompts or adapters, or explicitly embed them in a cache model for inference. However, the narrow distribution of few shots often contains incomplete class information, leading to biased visual knowledge with high risk of misclassification. To tackle this problem, recent methods propose to supplement visual knowledge by generative models or extra databases, which can be costly and time-consuming. In this paper, we propose an Iterative Visual Knowledge CompLetion (KCL) method to complement visual knowledge by properly taking advantages of unlabeled samples without access to any auxiliary or synthetic data. Specifically, KCL first measures the similarities between unlabeled samples and each category. Then, the samples with top confidence to each category is selected and collected by a designed confidence criterion. Finally, the collected samples are treated as labeled ones and added to few shots to jointly re-estimate the remaining unlabeled ones. The above procedures will be repeated for a certain number of iterations with more and more samples being collected until convergence, ensuring a progressive and robust knowledge completion process. Extensive experiments on 11 benchmark datasets demonstrate the effectiveness and efficiency of KCL as a plug-and-play module under both few-shot and zero-shot learning settings. Code is available at https://github.com/Mark-Sky/KCL.
Paper Structure (19 sections, 7 equations, 10 figures, 7 tables)

This paper contains 19 sections, 7 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Effectiveness demonstration of KCL. (a) t-SNE visualization of three categories in ImageNet deng2009imagenet before and after iterative visual knowledge completion based on 1-shot MaPLe khattak2023maple. (b) Classification results before and after using iterative visual knowledge completion on zero-shot CLIP radford2021learning and 1-shot MaPLe khattak2023maple.
  • Figure 2: Further analyses under few-shot settings. (a) The averaged performance on 11 datasets compared with C2A roy2023Cap2Aug. (b) The averaged performance of hyper-parameter sensitivity analysis on 11 datasets based on MaPLe khattak2023maple. Note that $^{\dagger}$ means hyper-parameters $\lambda$ and $\mu$ of KCL method are both fixed to 1.
  • Figure 3: Ablation study on $K$ values based on MaPLe khattak2023maple. Note that we incorporate all unlabeled samples in test set for fair comparison.
  • Figure 4: Ablation study on number of unlabeled samples required for knowledge completion. Note that we conduct experiments based on MaPLe khattak2023maple under few-shot settings.
  • Figure 5: Ablation study on the designed confidence criterion based on MaPLe khattak2023maple.
  • ...and 5 more figures