Self-Supervised Position Debiasing for Large Language Models

Zhongkun Liu; Zheng Chen; Mengqi Zhang; Zhaochun Ren; Pengjie Ren; Zhumin Chen

Self-Supervised Position Debiasing for Large Language Models

Zhongkun Liu, Zheng Chen, Mengqi Zhang, Zhaochun Ren, Pengjie Ren, Zhumin Chen

TL;DR

This work tackles position bias in fine-tuned large language models by proposing ZOE, a self-supervised debiasing framework built from three components: Low-bias Inference to generate unsupervised responses with reduced positional cues, MSA to prune low-quality candidates, and Multi-Objective Optimization to fine-tune the model using a blend of the target task objective and the debiasing objective. The method shows consistent improvements on non-biased samples across eight datasets and five tasks, while preserving performance on biased samples, outperforming several baselines. Crucially, ZOE achieves debiasing without external bias annotations or non-biased data, highlighting practical applicability and robustness to resource constraints. The authors provide code and discuss limitations such as dependence on the quality of pre-trained responses and potential noise in aligned unsupervised signals, outlining paths for future improvement.

Abstract

Fine-tuning has been demonstrated to be an effective method to improve the domain performance of large language models (LLMs). However, LLMs might fit the dataset bias and shortcuts for prediction, leading to poor generation performance. Previous works have proven that LLMs are prone to exhibit position bias, i.e., leveraging information positioned at the beginning or end, or specific positional cues within the input. Existing debiasing methods for LLMs require external bias knowledge or annotated non-biased samples, which is lacking for position debiasing and impractical in reality. In this work, we propose a self-supervised position debiasing (SOD) framework to mitigate position bias for LLMs. SOD leverages unsupervised responses from pre-trained LLMs for debiasing without relying on any external knowledge. To improve the quality of unsupervised responses, we propose an objective alignment (OAM) module to prune these responses. Experiments on eight datasets and five tasks show that SOD consistently outperforms existing methods in mitigating three types of position biases. Besides, SOD achieves this by sacrificing only a small performance on biased samples, which is general and effective. To facilitate the reproducibility of the results, we share the code of all methods and datasets on https://github.com/LZKSKY/SOD.

Self-Supervised Position Debiasing for Large Language Models

TL;DR

Abstract

Paper Structure (32 sections, 9 equations, 6 figures, 17 tables)

This paper contains 32 sections, 9 equations, 6 figures, 17 tables.

Introduction
Preliminary
Task Definition
Large Language Model
Method
Low-bias Inference
Objective Alignment
Multi-Objective Optimization
Experiments
Datasets
Evaluation Metrics
Baseline Methods
Implementation Details
Results
Analysis
...and 17 more sections

Figures (6)

Figure 1: Question answering performance of FlanT5-large (T5) and fine-tuned FlanT5-large (FT) over different relative positions in CANARD. Relative position means the distance of grounded utterances between the last turn answer and the current turn answer.
Figure 2: Overview of our proposed ZOE framework (taking CQG as the example). First, the low-bias inference module collects multiple unsupervised questions from LLM. Then, the objective alignment module aligns these questions with the target question. Finally, these aligned questions are utilized for fine-tuning within the multi-objective optimization module.
Figure 3: Performance (%) of four tasks over each $\alpha$. The x-axis denotes the value of $\alpha$ and the y-axis denotes the ROUGE-L score on non-biased datasets.
Figure 4: Performance (%) of four tasks over different numbers of training samples. The x-axis denotes the number of training samples and the y-axis denotes the ROUGE-L score on non-biased datasets.
Figure 5: Performance over each $\alpha$ on all datasets. The x-axis denotes the value of $\alpha$ and the y-axis denotes the ROUGE-L score on non-biased datasets.
...and 1 more figures

Self-Supervised Position Debiasing for Large Language Models

TL;DR

Abstract

Self-Supervised Position Debiasing for Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (6)