Position Engineering: Boosting Large Language Models through Positional Information Manipulation

Zhiyuan He; Huiqiang Jiang; Zilong Wang; Yuqing Yang; Luna Qiu; Lili Qiu

Position Engineering: Boosting Large Language Models through Positional Information Manipulation

Zhiyuan He, Huiqiang Jiang, Zilong Wang, Yuqing Yang, Luna Qiu, Lili Qiu

TL;DR

A novel technique termed position engineering is introduced, which offers a more efficient way to guide large language models and substantially improves upon the baseline in both cases of retrieval-augmented generation and in-context learning.

Abstract

The performance of large language models (LLMs) is significantly influenced by the quality of the prompts provided. In response, researchers have developed enormous prompt engineering strategies aimed at modifying the prompt text to enhance task performance. In this paper, we introduce a novel technique termed position engineering, which offers a more efficient way to guide large language models. Unlike prompt engineering, which requires substantial effort to modify the text provided to LLMs, position engineering merely involves altering the positional information in the prompt without modifying the text itself. We have evaluated position engineering in two widely-used LLM scenarios: retrieval-augmented generation (RAG) and in-context learning (ICL). Our findings show that position engineering substantially improves upon the baseline in both cases. Position engineering thus represents a promising new strategy for exploiting the capabilities of large language models.

Position Engineering: Boosting Large Language Models through Positional Information Manipulation

TL;DR

Abstract

Paper Structure (17 sections, 8 equations, 4 figures, 6 tables)

This paper contains 17 sections, 8 equations, 4 figures, 6 tables.

Introduction
Methodology
Preliminary
Altering Position Information in Prompts
Position Engineering
Experiments
Position Engineering for RAG
Universal Position Configuration for RAG
Without the instruction segment
Position Engineering for ICL
Discussion
Related Works
Conclusion
Limitations
Appendix
...and 2 more sections

Figures (4)

Figure 1: Comparison of prompt engineering and position engineering. "Para" refers to paragraphs, and "Sent" to sentences in prompts. Prompt engineering involves either adding, replacing, or removing paragraphs and sentences from prompts. In contrast, the proposed position engineering maintains the original prompt text but incorporates placeholder tokens instead. These placeholders are not involved in the computation of attention scores, thus the computation overhead is not increased. However, they do hold position indices, thereby affecting the position information of other tokens in the text.
Figure 2: Position Engineering for RAG. In the figure, the term "PH tokens" refers to the placeholder tokens introduced in Section \ref{['sec:alter-position-information']}. We investigate a defined search space, with inserting $\theta_A$ placeholder tokens between the instruction and document segments, and $\theta_B$ placeholder tokens between the document and question segments. Both $\theta_A$ and $\theta_B$ range from $\{0, 100, ..., 2500\}$, subject to $\theta_A + \theta_B \leq 2500$.
Figure 3: We visualize the average percentile values for each positional configuration $(\theta_A, \theta_B)$. These values are initially obtained by aggregating all accuracy scores for a given dataset and a specific number of retrieved documents, and calculate the percentile scores. Subsequently, they are averaged across all configurations, as detailed in Section \ref{['sec:universal-rag']}.
Figure 4: Position Engineering for ICL. In the figure, the term "PH tokens" refers to the placeholder tokens introduced in Section \ref{['sec:alter-position-information']}. We investigate a defined search space, with inserting $\theta_A$ placeholder tokens between the instruction and document segments, $\theta_B$ placeholder tokens between the document and question segments, and $\theta_{mid}$ placeholder tokens among the examples. The candidate value set of $\theta_A$ and $\theta_B$ is set to $\{0, 100, ..., 600\}$, and while $\theta_{mid}$ is set to $\{0, 20, ..., 100\}$.

Position Engineering: Boosting Large Language Models through Positional Information Manipulation

TL;DR

Abstract

Position Engineering: Boosting Large Language Models through Positional Information Manipulation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)