Table of Contents
Fetching ...

Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?

Zhanke Zhou, Rong Tao, Jianing Zhu, Yiwen Luo, Zengmao Wang, Bo Han

TL;DR

The method of contrastive denoising with noisy chain-of-thought (CD-CoT) is proposed, which enhances LLMs' denoising-reasoning capabilities by contrasting noisy rationales with only one clean rationale, which can be the minimal requirement for denoising-purpose prompting.

Abstract

This paper investigates an under-explored challenge in large language models (LLMs): chain-of-thought prompting with noisy rationales, which include irrelevant or inaccurate reasoning thoughts within examples used for in-context learning. We construct NoRa dataset that is tailored to evaluate the robustness of reasoning in the presence of noisy rationales. Our findings on NoRa dataset reveal a prevalent vulnerability to such noise among current LLMs, with existing robust methods like self-correction and self-consistency showing limited efficacy. Notably, compared to prompting with clean rationales, base LLM drops by 1.4%-19.8% in accuracy with irrelevant thoughts and more drastically by 2.2%-40.4% with inaccurate thoughts. Addressing this challenge necessitates external supervision that should be accessible in practice. Here, we propose the method of contrastive denoising with noisy chain-of-thought (CD-CoT). It enhances LLMs' denoising-reasoning capabilities by contrasting noisy rationales with only one clean rationale, which can be the minimal requirement for denoising-purpose prompting. This method follows a principle of exploration and exploitation: (1) rephrasing and selecting rationales in the input space to achieve explicit denoising and (2) exploring diverse reasoning paths and voting on answers in the output space. Empirically, CD-CoT demonstrates an average improvement of 17.8% in accuracy over the base model and shows significantly stronger denoising capabilities than baseline methods. The source code is publicly available at: https://github.com/tmlr-group/NoisyRationales.

Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?

TL;DR

The method of contrastive denoising with noisy chain-of-thought (CD-CoT) is proposed, which enhances LLMs' denoising-reasoning capabilities by contrasting noisy rationales with only one clean rationale, which can be the minimal requirement for denoising-purpose prompting.

Abstract

This paper investigates an under-explored challenge in large language models (LLMs): chain-of-thought prompting with noisy rationales, which include irrelevant or inaccurate reasoning thoughts within examples used for in-context learning. We construct NoRa dataset that is tailored to evaluate the robustness of reasoning in the presence of noisy rationales. Our findings on NoRa dataset reveal a prevalent vulnerability to such noise among current LLMs, with existing robust methods like self-correction and self-consistency showing limited efficacy. Notably, compared to prompting with clean rationales, base LLM drops by 1.4%-19.8% in accuracy with irrelevant thoughts and more drastically by 2.2%-40.4% with inaccurate thoughts. Addressing this challenge necessitates external supervision that should be accessible in practice. Here, we propose the method of contrastive denoising with noisy chain-of-thought (CD-CoT). It enhances LLMs' denoising-reasoning capabilities by contrasting noisy rationales with only one clean rationale, which can be the minimal requirement for denoising-purpose prompting. This method follows a principle of exploration and exploitation: (1) rephrasing and selecting rationales in the input space to achieve explicit denoising and (2) exploring diverse reasoning paths and voting on answers in the output space. Empirically, CD-CoT demonstrates an average improvement of 17.8% in accuracy over the base model and shows significantly stronger denoising capabilities than baseline methods. The source code is publicly available at: https://github.com/tmlr-group/NoisyRationales.

Paper Structure

This paper contains 46 sections, 2 theorems, 19 equations, 14 figures, 35 tables, 1 algorithm.

Key Result

Lemma D.3

let $\mathcal{B}$ denotes the set of $\theta$ which does not satisfy Condition cond:2. We assume that $\text{KL}(p_{prompt}(y_\text{test}|x_\text{test}))||p(y_\text{test}|x_\text{test},\theta)$ is bounded for all $\theta$ and $\theta^*$ minimizes the multi-class logistic risk as, We can have if then where $g(\tau) = \frac{1}{2}((1-\tau)\log(1-\tau)+(1+\tau)\log(1+\tau))$ is the calibration func

Figures (14)

  • Figure 1: Exemplars of noisy questions shi2023large and noisy rationales (our new research problem). Each input includes three prompting examples and one test question. Notably, the test question asks about base-9 calculation, while the misguiding base-10 information is given in noisy questions or rationales.
  • Figure 2: Results of GPT-3.5 with 0-shot, 3-shot clean rationales, and 3-shot noisy rationales: Both inaccurate and irrelevant rationales degenerate performance significantly, while the proposed CD-CoT improves robustness against noisy rationales.
  • Figure 3: Chain modeling of the noisy rationale problem: Recovering chain (3) from chain (1) with the guidance of chain (2). From question $x_i$ to answer $y_i$, the rationale of chain (3) includes clean thoughts $T_{i}^{(j)}$ and noisy thoughts $\hat{T}_{i}^{(j)}$.
  • Figure 4: CD-CoT's first two steps for data denoising. First, it rephrases the $i$-th noisy example by contrasting it with the clean example. Then, with the obtained $N$ rephrased examples, it selects the $M$ qualified candidates by checking the validity of the rephrased answers $\hat{y}_{i1}, \ldots, \hat{y}_{iN}$w.r.t.$y_i$.
  • Figure 5: CD-CoT constructs $M$ inputs ($K$-shot) by allocating the $K \cdot M$ rephrased rationales. These inputs are concatenated with the clean example and test question and then fed to an LLM for reasoning separately. The obtained $D$ answers are equally voted to obtain the final answer $y$.
  • ...and 9 more figures

Theorems & Definitions (4)

  • Remark 3.1
  • Lemma D.3: noisy-relaxed bound in xie2021explanation
  • Theorem D.4
  • proof : Proof of Theorem \ref{['theo:1']}