Table of Contents
Fetching ...

PEARL: Towards Permutation-Resilient LLMs

Liang Chen, Li Shen, Yang Deng, Xiaoyan Zhao, Bin Liang, Kam-Fai Wong

TL;DR

This work tackles the vulnerability of in-context learning to demonstration permutations in large language models. It proposes Permutation-resilient Learning (PEARL), a distributionally robust optimization (DRO) framework where a permutation-proposal network (P-Net) adversarially generates challenging demonstration orders, guided by an optimal-transport formulation solved via the Sinkhorn algorithm. The method formalizes a DRO objective over all possible permutations and trains the P-Net and LLM in a minimax loop, yielding improved worst-case and generalization performance on both synthetic linear-function tasks and real instruction-tuning benchmarks, with gains up to 24–40% in many-shot and long-context regimes. By addressing order-robustness at training time, PEARL reduces vulnerability to permutation attacks and offers a scalable approach for safer, more reliable ICL in diverse tasks and models, with broader applicability to other set-structured inputs.

Abstract

The in-context learning (ICL) capability of large language models (LLMs) enables them to perform challenging tasks using provided demonstrations. However, ICL is highly sensitive to the ordering of demonstrations, leading to instability in predictions. This paper shows that this vulnerability can be exploited to design a natural attack - difficult for model providers to detect - that achieves nearly 80% success rate on LLaMA-3 by simply permuting the demonstrations. Existing mitigation methods primarily rely on post-processing and fail to enhance the model's inherent robustness to input permutations, raising concerns about safety and reliability of LLMs. To address this issue, we propose Permutation-resilient learning (PEARL), a novel framework based on distributionally robust optimization (DRO), which optimizes model performance against the worst-case input permutation. Specifically, PEARL consists of a permutation-proposal network (P-Net) and the LLM. The P-Net generates the most challenging permutations by treating it as an optimal transport problem, which is solved using an entropy-constrained Sinkhorn algorithm. Through minimax optimization, the P-Net and the LLM iteratively optimize against each other, progressively improving the LLM's robustness. Experiments on synthetic pre-training and real-world instruction tuning tasks demonstrate that PEARL effectively mitigates permutation attacks and enhances performance. Notably, despite being trained on fewer shots and shorter contexts, PEARL achieves performance gains of up to 40% when scaled to many-shot and long-context scenarios, highlighting its efficiency and generalization capabilities.

PEARL: Towards Permutation-Resilient LLMs

TL;DR

This work tackles the vulnerability of in-context learning to demonstration permutations in large language models. It proposes Permutation-resilient Learning (PEARL), a distributionally robust optimization (DRO) framework where a permutation-proposal network (P-Net) adversarially generates challenging demonstration orders, guided by an optimal-transport formulation solved via the Sinkhorn algorithm. The method formalizes a DRO objective over all possible permutations and trains the P-Net and LLM in a minimax loop, yielding improved worst-case and generalization performance on both synthetic linear-function tasks and real instruction-tuning benchmarks, with gains up to 24–40% in many-shot and long-context regimes. By addressing order-robustness at training time, PEARL reduces vulnerability to permutation attacks and offers a scalable approach for safer, more reliable ICL in diverse tasks and models, with broader applicability to other set-structured inputs.

Abstract

The in-context learning (ICL) capability of large language models (LLMs) enables them to perform challenging tasks using provided demonstrations. However, ICL is highly sensitive to the ordering of demonstrations, leading to instability in predictions. This paper shows that this vulnerability can be exploited to design a natural attack - difficult for model providers to detect - that achieves nearly 80% success rate on LLaMA-3 by simply permuting the demonstrations. Existing mitigation methods primarily rely on post-processing and fail to enhance the model's inherent robustness to input permutations, raising concerns about safety and reliability of LLMs. To address this issue, we propose Permutation-resilient learning (PEARL), a novel framework based on distributionally robust optimization (DRO), which optimizes model performance against the worst-case input permutation. Specifically, PEARL consists of a permutation-proposal network (P-Net) and the LLM. The P-Net generates the most challenging permutations by treating it as an optimal transport problem, which is solved using an entropy-constrained Sinkhorn algorithm. Through minimax optimization, the P-Net and the LLM iteratively optimize against each other, progressively improving the LLM's robustness. Experiments on synthetic pre-training and real-world instruction tuning tasks demonstrate that PEARL effectively mitigates permutation attacks and enhances performance. Notably, despite being trained on fewer shots and shorter contexts, PEARL achieves performance gains of up to 40% when scaled to many-shot and long-context scenarios, highlighting its efficiency and generalization capabilities.

Paper Structure

This paper contains 34 sections, 14 equations, 8 figures, 12 tables, 1 algorithm.

Figures (8)

  • Figure 1: Performance and attack success rates of Llama-3 on CurDial and TMW datasets. Left panels: Random, average and worst-case performance as a function of shot number. Right panels: Attack success rates for exhaustive and neural search attack methods at different thresholds.
  • Figure 1: Normalized MSE across permutations.
  • Figure 2: Comparison of models trained under ERM and DRO paradigms. The blue bars represent the empirical distribution $\hat{P}$ of training data, showing different frequencies of six permutations in the training set. The purple curves denote the learned distribution $P_\theta$ by (a) ERM and (b) DRO models, illustrating their different behaviors on less appeared but valid permutations.
  • Figure 3: An overview of the learning framework. The P-Net is a small model incorporating the Sinkhorn operator, trained jointly with the LLM under the adversarial optimization algorithm. Note that the permutation matrix operates on the input sequence's embeddings (simplified here as text sequences for clarity). After training, only the LLM is retained while the P-Net is discarded.
  • Figure 4: Comparison of attack success rates.
  • ...and 3 more figures