Table of Contents
Fetching ...

MetaCoCo: A New Few-Shot Classification Benchmark with Spurious Correlation

Min Zhang, Haoxuan Li, Fei Wu, Kun Kuang

TL;DR

MetaCoCo addresses the vulnerability of few-shot classification to spurious correlations by introducing a large-scale benchmark with explicit concept–context pairs drawn from real-world data. It quantifies spurious-shift extent using a CLIP-based metric and evaluates a wide range of FSC methods, revealing significant degradation under spurious-context shifts and limited benefits from existing cross-domain or self-supervised approaches. The work provides a unified framework and open-source resources to advance robustness to SC-FSC and to guide future development of context-aware learning strategies. Overall, MetaCoCo highlights the critical need to model and mitigate non-causal contextual cues in FSC for real-world deployment.

Abstract

Out-of-distribution (OOD) problems in few-shot classification (FSC) occur when novel classes sampled from testing distributions differ from base classes drawn from training distributions, which considerably degrades the performance of deep learning models deployed in real-world applications. Recent studies suggest that the OOD problems in FSC mainly including: (a) cross-domain few-shot classification (CD-FSC) and (b) spurious-correlation few-shot classification (SC-FSC). Specifically, CD-FSC occurs when a classifier learns transferring knowledge from base classes drawn from seen training distributions but recognizes novel classes sampled from unseen testing distributions. In contrast, SC-FSC arises when a classifier relies on non-causal features (or contexts) that happen to be correlated with the labels (or concepts) in base classes but such relationships no longer hold during the model deployment. Despite CD-FSC has been extensively studied, SC-FSC remains understudied due to lack of the corresponding evaluation benchmarks. To this end, we present Meta Concept Context (MetaCoCo), a benchmark with spurious-correlation shifts collected from real-world scenarios. Moreover, to quantify the extent of spurious-correlation shifts of the presented MetaCoCo, we further propose a metric by using CLIP as a pre-trained vision-language model. Extensive experiments on the proposed benchmark are performed to evaluate the state-of-the-art methods in FSC, cross-domain shifts, and self-supervised learning. The experimental results show that the performance of the existing methods degrades significantly in the presence of spurious-correlation shifts. We open-source all codes of our benchmark and hope that the proposed MetaCoCo can facilitate future research on spurious-correlation shifts problems in FSC. The code is available at: https://github.com/remiMZ/MetaCoCo-ICLR24.

MetaCoCo: A New Few-Shot Classification Benchmark with Spurious Correlation

TL;DR

MetaCoCo addresses the vulnerability of few-shot classification to spurious correlations by introducing a large-scale benchmark with explicit concept–context pairs drawn from real-world data. It quantifies spurious-shift extent using a CLIP-based metric and evaluates a wide range of FSC methods, revealing significant degradation under spurious-context shifts and limited benefits from existing cross-domain or self-supervised approaches. The work provides a unified framework and open-source resources to advance robustness to SC-FSC and to guide future development of context-aware learning strategies. Overall, MetaCoCo highlights the critical need to model and mitigate non-causal contextual cues in FSC for real-world deployment.

Abstract

Out-of-distribution (OOD) problems in few-shot classification (FSC) occur when novel classes sampled from testing distributions differ from base classes drawn from training distributions, which considerably degrades the performance of deep learning models deployed in real-world applications. Recent studies suggest that the OOD problems in FSC mainly including: (a) cross-domain few-shot classification (CD-FSC) and (b) spurious-correlation few-shot classification (SC-FSC). Specifically, CD-FSC occurs when a classifier learns transferring knowledge from base classes drawn from seen training distributions but recognizes novel classes sampled from unseen testing distributions. In contrast, SC-FSC arises when a classifier relies on non-causal features (or contexts) that happen to be correlated with the labels (or concepts) in base classes but such relationships no longer hold during the model deployment. Despite CD-FSC has been extensively studied, SC-FSC remains understudied due to lack of the corresponding evaluation benchmarks. To this end, we present Meta Concept Context (MetaCoCo), a benchmark with spurious-correlation shifts collected from real-world scenarios. Moreover, to quantify the extent of spurious-correlation shifts of the presented MetaCoCo, we further propose a metric by using CLIP as a pre-trained vision-language model. Extensive experiments on the proposed benchmark are performed to evaluate the state-of-the-art methods in FSC, cross-domain shifts, and self-supervised learning. The experimental results show that the performance of the existing methods degrades significantly in the presence of spurious-correlation shifts. We open-source all codes of our benchmark and hope that the proposed MetaCoCo can facilitate future research on spurious-correlation shifts problems in FSC. The code is available at: https://github.com/remiMZ/MetaCoCo-ICLR24.
Paper Structure (16 sections, 4 equations, 7 figures, 7 tables)

This paper contains 16 sections, 4 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Example of cross-domain shifts and spurious-correlation shifts in FSC. (a) In Meta-dataset with cross-domain shifts triantafillou2019meta, the model is trained on base classes sampled from three datasets including miniImageNet, CUB-200-2011 and Aircraft, then tested on novel classes drawn from VGG Flower. (b) In our proposed MetaCoCo with spurious-correlation shifts, each class (or concept, e.g., dog) consists of different backgrounds (or context, e.g., autumn).
  • Figure 2: (a) The sample-averaged similarity $\mathcal{M}_{ce}$ between concepts and images on the existing FSC benchmarks and the proposed MetaCoCo, where MetaCoCo has significantly lower similarity between contexts and images. (b) The context-image similarities $\mathcal{M}_{te}$ (horizontal axis) versus the concept-image similarities $\mathcal{M}_{ce}$ (vertical axis) of the sample points in the MetaCoCo.
  • Figure 3: Experiments of the test-tuning phase with different sampling episodes, i.e., IID and OOD.
  • Figure 4: Experiments of different backbone architectures under 5-way and 10-way 1-shot settings.
  • Figure 5: Experimental results of different ways (left) and shots (right) on testing performance.
  • ...and 2 more figures