Let the Fuzzy Rule Speak: Enhancing In-context Learning Debiasing with Interpretability
Ruixi Lin, Yang You
TL;DR
This work addresses imbalanced per-class accuracy in in-context learning by introducing FuRud, a post-hoc, interpretable debiasing method that performs per-sample, per-class probability corrections via fuzzy membership functions. FuRud optimizes a set of triangular membership functions for each class using simulated annealing to minimize the class-accuracy bias COBias while maximizing overall accuracy, without updating the underlying LLM. Across seven benchmarks, FuRud yields a relative 21% improvement in accuracy and a 56% reduction in COBias on average, while providing per-sample interpretability of why and how corrections are applied. The approach demonstrates strong performance across diverse datasets and models, with favorable comparisons to state-of-the-art debiasing methods and robust applicability to different ICL settings and prompting strategies.
Abstract
Large language models (LLMs) often struggle with balanced class accuracy in text classification tasks using in-context learning (ICL), hindering some practical uses due to user dissatisfaction or safety risks caused by misclassifications. Retraining LLMs to address root causes in data or model priors is neither easy nor cost-effective. This paper delves deeper into the class accuracy imbalance issue, identifying that it arises because certain classes consistently receive disproportionately high ICL probabilities, causing under-prediction and lower accuracy for others. More importantly, probability ranges affect the imbalance differently, allowing for precise, range-specific corrections. We introduce FuRud (Fuzzy Rule Optimization-based Debiasing), a method for sample-level class probability correction. FuRud tackles interpretability challenges by determining why certain classes need corrections and tailoring adjustments for each instance's class probabilities which is powered by fuzzy sets with triangular membership functions, transforming a class probability based on the range it belongs to. By solving a nonlinear integer programming problem with a labeled set of ICL class probabilities to minimize class accuracy bias (COBias) and maximize overall accuracy, each class selects an optimal correction function from 19 triangular membership functions without updating an LLM, and the selected functions correct test instances at inference. Across seven benchmark datasets, FuRud reduces COBias by over half (56%) and improves overall accuracy by 21% relatively, outperforming state-of-the-art debiasing methods.
