Table of Contents
Fetching ...

NoisyICL: A Little Noise in Model Parameters Calibrates In-context Learning

Yufeng Zhao, Yoshihiro Sakai, Naoya Inoue

TL;DR

This paper addresses calibration and performance gaps in In-Context Learning (ICL) by proposing NoisyICL, a lightweight approach that perturbs pre-trained model parameters with Gaussian noise before performing ICL. The perturbation follows $\theta_i' = (1-\lambda)\theta_i + \lambda \mathcal{N}(0,\sigma^2)$, and links higher noise to increased token entropy, suggesting fairer predictions. Evaluated on GPT-2 and GPT-J across 12 classification datasets, NoisyICL yields about a 10% average improvement in ICL performance and yields more calibrated outputs, with reductions in miscalibration metrics such as $ECE_1$ by roughly 25% in many cases. The findings imply that NoisyICL acts as a calibration bridge between pre-training and ICL, offering a low-cost alternative to fine-tuning and guiding future work on noise scheduling and layer-wise perturbations.

Abstract

In-Context Learning (ICL) is suffering from unsatisfactory performance and under-calibration due to high prior bias and unfaithful confidence. Some previous works fine-tuned language models for better ICL performance with enormous datasets and computing costs. In this paper, we propose NoisyICL, simply perturbing the model parameters by random noises to strive for better performance and calibration. Our experiments on two models and 12 downstream datasets show that NoisyICL can help ICL produce more accurate predictions. Our further analysis indicates that NoisyICL enables the model to provide more fair predictions, and also with more faithful confidence. Therefore, we believe that NoisyICL is an effective calibration of ICL. Our experimental code is uploaded to Github.

NoisyICL: A Little Noise in Model Parameters Calibrates In-context Learning

TL;DR

This paper addresses calibration and performance gaps in In-Context Learning (ICL) by proposing NoisyICL, a lightweight approach that perturbs pre-trained model parameters with Gaussian noise before performing ICL. The perturbation follows , and links higher noise to increased token entropy, suggesting fairer predictions. Evaluated on GPT-2 and GPT-J across 12 classification datasets, NoisyICL yields about a 10% average improvement in ICL performance and yields more calibrated outputs, with reductions in miscalibration metrics such as by roughly 25% in many cases. The findings imply that NoisyICL acts as a calibration bridge between pre-training and ICL, offering a low-cost alternative to fine-tuning and guiding future work on noise scheduling and layer-wise perturbations.

Abstract

In-Context Learning (ICL) is suffering from unsatisfactory performance and under-calibration due to high prior bias and unfaithful confidence. Some previous works fine-tuned language models for better ICL performance with enormous datasets and computing costs. In this paper, we propose NoisyICL, simply perturbing the model parameters by random noises to strive for better performance and calibration. Our experiments on two models and 12 downstream datasets show that NoisyICL can help ICL produce more accurate predictions. Our further analysis indicates that NoisyICL enables the model to provide more fair predictions, and also with more faithful confidence. Therefore, we believe that NoisyICL is an effective calibration of ICL. Our experimental code is uploaded to Github.
Paper Structure (36 sections, 6 equations, 28 figures, 7 tables)

This paper contains 36 sections, 6 equations, 28 figures, 7 tables.

Figures (28)

  • Figure 1: Upper: A sketch of NoisyICL: Unlike previous works which fine-tuned LMs towards ICL tasks, we perturb LMs by random noise sampled from the normal distribution $\mathcal{N}(0,\sigma^2)$ with intensity $\lambda$, then perform ICL. Lower: The average accuracy of ICL with and without NoisyICL w.r.t. the number of demos.
  • Figure 2: The correlation between the normalized token entropy $H_n^t$ and the noise intensity $\lambda$ with empty inputs. When the noise gets stronger, the $H_n^t$ becomes higher, which indicates a fairer output.
  • Figure 3: The normalized label entropy $H_n^l$ on both models and 7 datasets with and without appropriate-noised NoisyICL. In most cases, the $H_n^l$ with NoisyICL (w/) is greater than without NoisyICL (w/o).
  • Figure 4: Left: Reliability diagrams (sparse bars) and global confidence distribution (dense bars) of GPT-2 on hate_speech18 with (w/, $ECE_1 = 9.01\%$) and without (w/o, $ECE_1 = 28.14\%$) NoisyICL. The predictions are divided into bins according to confidence, and we visualize the accuracy of each bin as a histogram. The grey bars are ideal, which is closer to the one with NoisyICL. Right upper: Confidence distribution on correct predictions. Relatively right-shifted with NoisyICL. Right lower: Confidence distribution on wrong predictions. Relatively left-shifted with NoisyICL. ($k=4$)
  • Figure 5: The relationship between demos quantity and accuracy in some cases. NoisyICL can make the model learn from the demos correctly.
  • ...and 23 more figures