Table of Contents
Fetching ...

Improving Complex Reasoning with Dynamic Prompt Corruption: A soft prompt Optimization Approach

Sinan Fan, Liang Xie, Chen Shen, Ge Teng, Xiaosong Yuan, Xiaofeng Zhang, Chenxi Huang, Wenxiao Wang, Xiaofei He, Jieping Ye

TL;DR

Prompt Tuning offers parameter-efficient benefits but struggles with complex reasoning due to how soft prompts can both aid and mislead subsequent steps. The authors propose Dynamic Prompt Corruption (DPC), a two-stage strategy consisting of Dynamic Trigger and Dynamic Corruption, to adaptively identify and neutralize harmful soft-prompt influence during reasoning. Across LLaMA2-13B, LLaMA3-8B, and Mistral-0.2-7B, DPC yields consistent 4–8 percentage-point gains on GSM8K, MATH, and AQuA compared with vanilla prompt tuning, demonstrating robust improvement in multi-step reasoning tasks. While effective, DPC incurs additional inference cost due to instance-level analysis, highlighting a trade-off between reasoning accuracy and computation that motivates further efficiency improvements and interpretability studies.

Abstract

Prompt-tuning (PT) for large language models (LLMs) can facilitate the performance on various conventional NLP tasks with significantly fewer trainable parameters. However, our investigation reveals that PT provides limited improvement and may even degrade the primitive performance of LLMs on complex reasoning tasks. Such a phenomenon suggests that soft prompts can positively impact certain instances while negatively affecting others, particularly during the later phases of reasoning. To address these challenges, We first identify an information accumulation within the soft prompts. Through detailed analysis, we demonstrate that this phenomenon is often accompanied by erroneous information flow patterns in the deeper layers of the model, which ultimately lead to incorrect reasoning outcomes. we propose a novel method called Dynamic Prompt Corruption (DPC) to take better advantage of soft prompts in complex reasoning tasks, which dynamically adjusts the influence of soft prompts based on their impact on the reasoning process. Specifically, DPC consists of two stages: Dynamic Trigger and Dynamic Corruption. First, Dynamic Trigger measures the impact of soft prompts, identifying whether beneficial or detrimental. Then, Dynamic Corruption mitigates the negative effects of soft prompts by selectively masking key tokens that interfere with the reasoning process. We validate the proposed approach through extensive experiments on various LLMs and reasoning tasks, including GSM8K, MATH, and AQuA. Experimental results demonstrate that DPC can consistently enhance the performance of PT, achieving 4%-8% accuracy gains compared to vanilla prompt tuning, highlighting the effectiveness of our approach and its potential to enhance complex reasoning in LLMs.

Improving Complex Reasoning with Dynamic Prompt Corruption: A soft prompt Optimization Approach

TL;DR

Prompt Tuning offers parameter-efficient benefits but struggles with complex reasoning due to how soft prompts can both aid and mislead subsequent steps. The authors propose Dynamic Prompt Corruption (DPC), a two-stage strategy consisting of Dynamic Trigger and Dynamic Corruption, to adaptively identify and neutralize harmful soft-prompt influence during reasoning. Across LLaMA2-13B, LLaMA3-8B, and Mistral-0.2-7B, DPC yields consistent 4–8 percentage-point gains on GSM8K, MATH, and AQuA compared with vanilla prompt tuning, demonstrating robust improvement in multi-step reasoning tasks. While effective, DPC incurs additional inference cost due to instance-level analysis, highlighting a trade-off between reasoning accuracy and computation that motivates further efficiency improvements and interpretability studies.

Abstract

Prompt-tuning (PT) for large language models (LLMs) can facilitate the performance on various conventional NLP tasks with significantly fewer trainable parameters. However, our investigation reveals that PT provides limited improvement and may even degrade the primitive performance of LLMs on complex reasoning tasks. Such a phenomenon suggests that soft prompts can positively impact certain instances while negatively affecting others, particularly during the later phases of reasoning. To address these challenges, We first identify an information accumulation within the soft prompts. Through detailed analysis, we demonstrate that this phenomenon is often accompanied by erroneous information flow patterns in the deeper layers of the model, which ultimately lead to incorrect reasoning outcomes. we propose a novel method called Dynamic Prompt Corruption (DPC) to take better advantage of soft prompts in complex reasoning tasks, which dynamically adjusts the influence of soft prompts based on their impact on the reasoning process. Specifically, DPC consists of two stages: Dynamic Trigger and Dynamic Corruption. First, Dynamic Trigger measures the impact of soft prompts, identifying whether beneficial or detrimental. Then, Dynamic Corruption mitigates the negative effects of soft prompts by selectively masking key tokens that interfere with the reasoning process. We validate the proposed approach through extensive experiments on various LLMs and reasoning tasks, including GSM8K, MATH, and AQuA. Experimental results demonstrate that DPC can consistently enhance the performance of PT, achieving 4%-8% accuracy gains compared to vanilla prompt tuning, highlighting the effectiveness of our approach and its potential to enhance complex reasoning in LLMs.

Paper Structure

This paper contains 37 sections, 13 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Input the same question to guide the LLM to answer it. The model was originally able to provide the correct answer, but after adding the soft prompts, it produced an error in reasoning.
  • Figure 2: (a) Saliency scores of prompt-to-question, prompt-to-rationale layer by layer (b) Illustration of a significant information accumulation phenomenon within soft prompts, where a specific token conveys strong information to both the question and the rationale.
  • Figure 3: Schematic illustration of our observations. Correct-answer cases (left) show a balanced accumulation of saliency in shallow layers, where rationale information is evenly gathered from the soft prompts. As the reasoning progresses into deeper layers, attention shifts from the soft prompts to earlier rationale steps and the question itself. In contrast, wrong-answer cases (right) exhibit excessive saliency in shallow layers and disruptions in the information flow in deep layers. The latter part of the reasoning in deep layers overly focuses on the soft prompts, leading to incorrect answers.
  • Figure 4: (a) describes the relationship between information accumulation in shallow layers and the change of information flow patterns in deep layers. (b) illustrates the distribution of information flow intensity from soft prompts to the latter part of the rationale in both good and bad cases. In good cases, the distribution tends to be weaker, while in bad cases, the distribution is stronger. (c) depicts the overall intensity of information flow from the soft prompts to the latter part of the rationale for both good and bad cases.
  • Figure 5: Overview of our method, Dynamic Prompt Corruption (DPC), which dynamically identifies erroneous information flow patterns and corrupts soft prompt tokens based on the location of information accumulation, alleviating the negative effects caused by the soft prompts in certain situations.
  • ...and 2 more figures