Table of Contents
Fetching ...

Let the Fuzzy Rule Speak: Enhancing In-context Learning Debiasing with Interpretability

Ruixi Lin, Yang You

TL;DR

This work addresses imbalanced per-class accuracy in in-context learning by introducing FuRud, a post-hoc, interpretable debiasing method that performs per-sample, per-class probability corrections via fuzzy membership functions. FuRud optimizes a set of triangular membership functions for each class using simulated annealing to minimize the class-accuracy bias COBias while maximizing overall accuracy, without updating the underlying LLM. Across seven benchmarks, FuRud yields a relative 21% improvement in accuracy and a 56% reduction in COBias on average, while providing per-sample interpretability of why and how corrections are applied. The approach demonstrates strong performance across diverse datasets and models, with favorable comparisons to state-of-the-art debiasing methods and robust applicability to different ICL settings and prompting strategies.

Abstract

Large language models (LLMs) often struggle with balanced class accuracy in text classification tasks using in-context learning (ICL), hindering some practical uses due to user dissatisfaction or safety risks caused by misclassifications. Retraining LLMs to address root causes in data or model priors is neither easy nor cost-effective. This paper delves deeper into the class accuracy imbalance issue, identifying that it arises because certain classes consistently receive disproportionately high ICL probabilities, causing under-prediction and lower accuracy for others. More importantly, probability ranges affect the imbalance differently, allowing for precise, range-specific corrections. We introduce FuRud (Fuzzy Rule Optimization-based Debiasing), a method for sample-level class probability correction. FuRud tackles interpretability challenges by determining why certain classes need corrections and tailoring adjustments for each instance's class probabilities which is powered by fuzzy sets with triangular membership functions, transforming a class probability based on the range it belongs to. By solving a nonlinear integer programming problem with a labeled set of ICL class probabilities to minimize class accuracy bias (COBias) and maximize overall accuracy, each class selects an optimal correction function from 19 triangular membership functions without updating an LLM, and the selected functions correct test instances at inference. Across seven benchmark datasets, FuRud reduces COBias by over half (56%) and improves overall accuracy by 21% relatively, outperforming state-of-the-art debiasing methods.

Let the Fuzzy Rule Speak: Enhancing In-context Learning Debiasing with Interpretability

TL;DR

This work addresses imbalanced per-class accuracy in in-context learning by introducing FuRud, a post-hoc, interpretable debiasing method that performs per-sample, per-class probability corrections via fuzzy membership functions. FuRud optimizes a set of triangular membership functions for each class using simulated annealing to minimize the class-accuracy bias COBias while maximizing overall accuracy, without updating the underlying LLM. Across seven benchmarks, FuRud yields a relative 21% improvement in accuracy and a 56% reduction in COBias on average, while providing per-sample interpretability of why and how corrections are applied. The approach demonstrates strong performance across diverse datasets and models, with favorable comparisons to state-of-the-art debiasing methods and robust applicability to different ICL settings and prompting strategies.

Abstract

Large language models (LLMs) often struggle with balanced class accuracy in text classification tasks using in-context learning (ICL), hindering some practical uses due to user dissatisfaction or safety risks caused by misclassifications. Retraining LLMs to address root causes in data or model priors is neither easy nor cost-effective. This paper delves deeper into the class accuracy imbalance issue, identifying that it arises because certain classes consistently receive disproportionately high ICL probabilities, causing under-prediction and lower accuracy for others. More importantly, probability ranges affect the imbalance differently, allowing for precise, range-specific corrections. We introduce FuRud (Fuzzy Rule Optimization-based Debiasing), a method for sample-level class probability correction. FuRud tackles interpretability challenges by determining why certain classes need corrections and tailoring adjustments for each instance's class probabilities which is powered by fuzzy sets with triangular membership functions, transforming a class probability based on the range it belongs to. By solving a nonlinear integer programming problem with a labeled set of ICL class probabilities to minimize class accuracy bias (COBias) and maximize overall accuracy, each class selects an optimal correction function from 19 triangular membership functions without updating an LLM, and the selected functions correct test instances at inference. Across seven benchmark datasets, FuRud reduces COBias by over half (56%) and improves overall accuracy by 21% relatively, outperforming state-of-the-art debiasing methods.

Paper Structure

This paper contains 17 sections, 7 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: An overview of FuRud. ICL output probabilities across answer classes for input instances are obtained. On an optimization set, class probabilities and ground-truth labels are input to the FuRud multi-objective nonlinear integer programming model for joint learning of optimal membership functions. During inference, the optimal membership functions perform tailored corrections to class probabilities for test instances. This figure is for illustration purposes only, actual range changes and improvements are detailed in Section \ref{['sec:exp']}.
  • Figure 2: 19 triangular membership functions.
  • Figure 3: Class probabilities before and after applying corrections. For each task, we report results of the seed 1 run out of 3 runs. There was a stark ICL accuracy difference of 37% between True and False on RTE. FuRud addresses it by amplifying the medium range of False and simultaneously reducing the relatively high range of True.
  • Figure 4: Quantitative evaluation on the ratio of instances that benefit from the correction, exemplified by class Business of AGNews. The difference of "Acc of examples" between bottom and top subfigures represents the ratio. The red color highlights the activated pieces of the membership function for range-specific correction.
  • Figure 5: Accuracy-COBias tradeoff of fuzzy partitions.
  • ...and 2 more figures