Table of Contents
Fetching ...

Enhancing Robustness in Large Language Models: Prompting for Mitigating the Impact of Irrelevant Information

Ming Jiang, Tingting Huang, Biao Guo, Yao Lu, Feng Zhang

TL;DR

This paper addresses the vulnerability of large language models to irrelevant information in problem descriptions by introducing GSMIR, a dataset designed to more realistically simulate interference than prior sets. It analyzes why LLMs are affected—namely, their partial ability to identify irrelevant content and a limited capacity to exclude it—and introduces ATF (Analysis to Filtration Prompting), a two-stage method that first analyzes the input to identify irrelevant clauses and then filters them before reasoning. Empirical results show that ATF substantially boosts reasoning accuracy across prompting paradigms, with the largest gains for Chain-of-Thought prompting, and increases the rate at which irrelevant information is recognized while reducing mis-exclusion. The work highlights a practical path to more robust prompt design and points to future directions for handling multiple irrelevant information pieces and broader model evaluations.

Abstract

In recent years, Large language models (LLMs) have garnered significant attention due to their superior performance in complex reasoning tasks. However, recent studies may diminish their reasoning capabilities markedly when problem descriptions contain irrelevant information, even with the use of advanced prompting techniques. To further investigate this issue, a dataset of primary school mathematics problems containing irrelevant information, named GSMIR, was constructed. Testing prominent LLMs and prompting techniques on this dataset revealed that while LLMs can identify irrelevant information, they do not effectively mitigate the interference it causes once identified. A novel automatic construction method, ATF, which enhances the ability of LLMs to identify and self-mitigate the influence of irrelevant information, is proposed to address this shortcoming. This method operates in two steps: first, analysis of irrelevant information, followed by its filtering. The ATF method, as demonstrated by experimental results, significantly improves the reasoning performance of LLMs and prompting techniques, even in the presence of irrelevant information on the GSMIR dataset.

Enhancing Robustness in Large Language Models: Prompting for Mitigating the Impact of Irrelevant Information

TL;DR

This paper addresses the vulnerability of large language models to irrelevant information in problem descriptions by introducing GSMIR, a dataset designed to more realistically simulate interference than prior sets. It analyzes why LLMs are affected—namely, their partial ability to identify irrelevant content and a limited capacity to exclude it—and introduces ATF (Analysis to Filtration Prompting), a two-stage method that first analyzes the input to identify irrelevant clauses and then filters them before reasoning. Empirical results show that ATF substantially boosts reasoning accuracy across prompting paradigms, with the largest gains for Chain-of-Thought prompting, and increases the rate at which irrelevant information is recognized while reducing mis-exclusion. The work highlights a practical path to more robust prompt design and points to future directions for handling multiple irrelevant information pieces and broader model evaluations.

Abstract

In recent years, Large language models (LLMs) have garnered significant attention due to their superior performance in complex reasoning tasks. However, recent studies may diminish their reasoning capabilities markedly when problem descriptions contain irrelevant information, even with the use of advanced prompting techniques. To further investigate this issue, a dataset of primary school mathematics problems containing irrelevant information, named GSMIR, was constructed. Testing prominent LLMs and prompting techniques on this dataset revealed that while LLMs can identify irrelevant information, they do not effectively mitigate the interference it causes once identified. A novel automatic construction method, ATF, which enhances the ability of LLMs to identify and self-mitigate the influence of irrelevant information, is proposed to address this shortcoming. This method operates in two steps: first, analysis of irrelevant information, followed by its filtering. The ATF method, as demonstrated by experimental results, significantly improves the reasoning performance of LLMs and prompting techniques, even in the presence of irrelevant information on the GSMIR dataset.
Paper Structure (21 sections, 6 figures, 4 tables)

This paper contains 21 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: The key factors considered in the creation of the GSMIR dataset are presented, along with an example problem. An irrelevant sentence (highlighted in bold yellow) was inserted before the standard problem, ensuring that this sentence does not impact the derivation of the standard answer.
  • Figure 2: The various prompt formats that were employed were presented, with the use of differently coloured rectangular blocks to represent each component. The rectangular blocks on the right correspond to those of the same colour on the left (using colour coding for easier iden-tification is recommended). The "Or" symbol indicates the option to choose any one of the building blocks. [Questions with irrelevant information] are generated by adding an unrelat-ed sentence (in red font) to the [original question description].
  • Figure 3: The identification rate of two methods for irrelevant information
  • Figure 4: Statistical accuracy rates on data of errors caused by the influence of irrelevant information for both methods.
  • Figure 5: An overview of the ATF method includes the process of extracting information from the question, analysing the information, and identifying irrelevant information (a), as well as the process of filtering out irrelevant information from the question (b).
  • ...and 1 more figures