Table of Contents
Fetching ...

Towards Self-Robust LLMs: Intrinsic Prompt Noise Resistance via CoIPO

Xin Yang, Letian Li, Abudukelimu Wuerkaixi, Xuxin Cheng, Cao Liu, Ke Zeng, Xunliang Cai, Wenyuan Jiang

TL;DR

A Contrastive Learning-based Inverse Direct Preference Optimization (CoIPO) method that minimizes the discrepancy between the label-aligned logits produced by the model under a clean prompt and its noisy counterpart, and conducts a detailed analysis using mutual information theory.

Abstract

Large language models (LLMs) have demonstrated remarkable and steadily improving performance across a wide range of tasks. However, LLM performance may be highly sensitive to prompt variations especially in scenarios with limited openness or strict output formatting requirements, indicating insufficient robustness. In real-world applications, user prompts provided to LLMs often contain imperfections, which may undermine the quality of the model's responses. To address this issue, previous work has primarily focused on preprocessing prompts, employing external tools or even LLMs to refine prompt formulations in advance. However, these approaches overlook the intrinsic robustness of LLMs, and their reliance on external components introduces additional computational overhead and uncertainty. In this work, we propose a Contrastive Learning-based Inverse Direct Preference Optimization (CoIPO) method that minimizes the discrepancy between the label-aligned logits produced by the model under a clean prompt and its noisy counterpart, and conduct a detailed analysis using mutual information theory. We augment the FLAN dataset by constructing paired prompts, each consisting of a clean prompt and its corresponding noisy version for training. Additionally, to evaluate the effectiveness, we develop NoisyPromptBench, a benchmark enhanced and derived from the existing PromptBench. Experimental results conducted on NoisyPromptBench demonstrate that our proposed method achieves a significant improvement in average accuracy over the current state-of-the-art approaches. The source code of CoIPO, pair-wise FLAN datasets, and NoisyPromptBench have already been released on https://github.com/vegetable-yx/CoIPO.

Towards Self-Robust LLMs: Intrinsic Prompt Noise Resistance via CoIPO

TL;DR

A Contrastive Learning-based Inverse Direct Preference Optimization (CoIPO) method that minimizes the discrepancy between the label-aligned logits produced by the model under a clean prompt and its noisy counterpart, and conducts a detailed analysis using mutual information theory.

Abstract

Large language models (LLMs) have demonstrated remarkable and steadily improving performance across a wide range of tasks. However, LLM performance may be highly sensitive to prompt variations especially in scenarios with limited openness or strict output formatting requirements, indicating insufficient robustness. In real-world applications, user prompts provided to LLMs often contain imperfections, which may undermine the quality of the model's responses. To address this issue, previous work has primarily focused on preprocessing prompts, employing external tools or even LLMs to refine prompt formulations in advance. However, these approaches overlook the intrinsic robustness of LLMs, and their reliance on external components introduces additional computational overhead and uncertainty. In this work, we propose a Contrastive Learning-based Inverse Direct Preference Optimization (CoIPO) method that minimizes the discrepancy between the label-aligned logits produced by the model under a clean prompt and its noisy counterpart, and conduct a detailed analysis using mutual information theory. We augment the FLAN dataset by constructing paired prompts, each consisting of a clean prompt and its corresponding noisy version for training. Additionally, to evaluate the effectiveness, we develop NoisyPromptBench, a benchmark enhanced and derived from the existing PromptBench. Experimental results conducted on NoisyPromptBench demonstrate that our proposed method achieves a significant improvement in average accuracy over the current state-of-the-art approaches. The source code of CoIPO, pair-wise FLAN datasets, and NoisyPromptBench have already been released on https://github.com/vegetable-yx/CoIPO.
Paper Structure (38 sections, 18 equations, 7 figures, 18 tables)

This paper contains 38 sections, 18 equations, 7 figures, 18 tables.

Figures (7)

  • Figure 1: The performance degradation of the models under different perturbation scenarios."Llama" denotes the Llama2-7B model and "Qwen" represents the Qwen2.5-7B model.
  • Figure 2: External tools incur additional time and monetary costs, limit adaptability, and introduce cumulative errors. In contrast, self-robustness requires only offline training, without the associated challenges.
  • Figure 3: Framework of CoIPO: The clean prompt and its corresponding perturbed version (in blue text), along with an unrelated prompt (in green text), are first concatenated with the label. The logits are then computed by the LLM for each. Logits 1 is preferred, while Logits 2 is dispreferred. Subsequently, based on the principles of contrastive learning, the KL divergence similarity between Logits 1 and Logits 2 relative to Logits 1' is calculated, with the goal of maximizing the similarity to the former and minimizing the similarity to the latter.
  • Figure 4: Schematic diagram of perturbation radii. Dots represent various perturbations, and their distance from the center of the circle is the perturbation radius. The perturbation radius is quantified by the performance degradation rate.
  • Figure 5: Trend chart illustrating the decline in performance with increasing perturbations.
  • ...and 2 more figures