Table of Contents
Fetching ...

Leveraging Biases in Large Language Models: "bias-kNN'' for Effective Few-Shot Learning

Yong Zhang, Hanzhang Li, Zhitao Li, Ning Cheng, Ming Li, Jing Xiao, Jianzong Wang

TL;DR

This work addresses biases inherent in large language models that hinder zero-shot and few-shot learning. It introduces bias-kNN, a method that converts biased LM outputs into kNN features and augments them with gold labels, enabling robust few-shot text classification across diverse domains and GPT-2 sizes. Through six datasets and multiple model variants, bias-kNN demonstrates competitive or superior accuracy compared with Zero-LM and Raw-ICL baselines, while showing stability to template and verbalizer choices. The findings suggest that strategically leveraging biases, rather than correcting them, can yield practical advantages for retrieval-based classification tasks in low-resource settings.

Abstract

Large Language Models (LLMs) have shown significant promise in various applications, including zero-shot and few-shot learning. However, their performance can be hampered by inherent biases. Instead of traditionally sought methods that aim to minimize or correct these biases, this study introduces a novel methodology named ``bias-kNN''. This approach capitalizes on the biased outputs, harnessing them as primary features for kNN and supplementing with gold labels. Our comprehensive evaluations, spanning diverse domain text classification datasets and different GPT-2 model sizes, indicate the adaptability and efficacy of the ``bias-kNN'' method. Remarkably, this approach not only outperforms conventional in-context learning in few-shot scenarios but also demonstrates robustness across a spectrum of samples, templates and verbalizers. This study, therefore, presents a unique perspective on harnessing biases, transforming them into assets for enhanced model performance.

Leveraging Biases in Large Language Models: "bias-kNN'' for Effective Few-Shot Learning

TL;DR

This work addresses biases inherent in large language models that hinder zero-shot and few-shot learning. It introduces bias-kNN, a method that converts biased LM outputs into kNN features and augments them with gold labels, enabling robust few-shot text classification across diverse domains and GPT-2 sizes. Through six datasets and multiple model variants, bias-kNN demonstrates competitive or superior accuracy compared with Zero-LM and Raw-ICL baselines, while showing stability to template and verbalizer choices. The findings suggest that strategically leveraging biases, rather than correcting them, can yield practical advantages for retrieval-based classification tasks in low-resource settings.

Abstract

Large Language Models (LLMs) have shown significant promise in various applications, including zero-shot and few-shot learning. However, their performance can be hampered by inherent biases. Instead of traditionally sought methods that aim to minimize or correct these biases, this study introduces a novel methodology named ``bias-kNN''. This approach capitalizes on the biased outputs, harnessing them as primary features for kNN and supplementing with gold labels. Our comprehensive evaluations, spanning diverse domain text classification datasets and different GPT-2 model sizes, indicate the adaptability and efficacy of the ``bias-kNN'' method. Remarkably, this approach not only outperforms conventional in-context learning in few-shot scenarios but also demonstrates robustness across a spectrum of samples, templates and verbalizers. This study, therefore, presents a unique perspective on harnessing biases, transforming them into assets for enhanced model performance.
Paper Structure (17 sections, 2 equations, 7 figures, 2 tables)

This paper contains 17 sections, 2 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Zero-shot probability and logit results from the CR train dataset, visualizing 50 samples each from the Positive and Negative categories. The model exhibits a clear bias towards the Positive category. The dashed line $y=x$ denotes the decision boundary for these categories.
  • Figure 2: The architecture of our proposed model
  • Figure 3: Verbalizer: robustness analysis. The shaded region denotes the standard deviation. All figures are consistent. Dashed lines of the same color indicate the Zero-LM accuracy for the corresponding settings.
  • Figure 4: Template: robustness analysis
  • Figure 5: Performance of biased logit as feature
  • ...and 2 more figures