Instructing Large Language Models to Identify and Ignore Irrelevant Conditions

Zhenyu Wu; Chao Shen; Meng Jiang

Instructing Large Language Models to Identify and Ignore Irrelevant Conditions

Zhenyu Wu, Chao Shen, Meng Jiang

TL;DR

A novel approach named I^3C is proposed that instructs LLMs to identify and ignore irrelevant conditions and develops I^3C-Select that selects the most confusing problems based on the semantic relevance measurement.

Abstract

Math word problem (MWP) solving requires generating a reasoning path based on a given problem description that often contains irrelevant conditions. Existing chain-of-thought (CoT) prompting methods elicited multi-step reasoning abilities of large language models (LLMs) to solve MWPs. However, they were seriously confused by the irrelevant conditions, resulting in low accuracy. In this paper, we propose a novel approach named I$^3$C that instructs LLMs to identify and ignore irrelevant conditions. It identifies a set of irrelevant condition candidates that have a weak semantic relevance with the question. Then it prompts LLMs to verify the irrelevant conditions. Lastly it instructs the LLMs with the verification on relevant and irrelevant conditions to avoid confusion and improve reasoning paths. Moreover, we propose to select (problem, reasoning paths) pairs as demonstrations to enhance I$^3$C with few-shot reasoning. We develop I$^3$C-Select that selects the most confusing problems based on the semantic relevance measurement. We conduct extensive experiments on eight MWP datasets. I$^3$C can be combined with any CoT prompting methods to improve the performance of solving MWPs. Notably, with GPT-3.5-Turbo and I$^3$C-Select, we achieve an accuracy of 96.0 and 94.1 on GSM-IC2-1K and GSM-ICM-1K, respectively, significantly outperforming the state-of-the-art few-shot prompting method Complex-CoT by +11.7 and +11.1. Our implementation is made publicly available at https://wzy6642.github.io/I3C.github.io/.

Instructing Large Language Models to Identify and Ignore Irrelevant Conditions

TL;DR

Abstract

C that instructs LLMs to identify and ignore irrelevant conditions. It identifies a set of irrelevant condition candidates that have a weak semantic relevance with the question. Then it prompts LLMs to verify the irrelevant conditions. Lastly it instructs the LLMs with the verification on relevant and irrelevant conditions to avoid confusion and improve reasoning paths. Moreover, we propose to select (problem, reasoning paths) pairs as demonstrations to enhance I

C with few-shot reasoning. We develop I

C-Select that selects the most confusing problems based on the semantic relevance measurement. We conduct extensive experiments on eight MWP datasets. I

C can be combined with any CoT prompting methods to improve the performance of solving MWPs. Notably, with GPT-3.5-Turbo and I

C-Select, we achieve an accuracy of 96.0 and 94.1 on GSM-IC2-1K and GSM-ICM-1K, respectively, significantly outperforming the state-of-the-art few-shot prompting method Complex-CoT by +11.7 and +11.1. Our implementation is made publicly available at https://wzy6642.github.io/I3C.github.io/.

Paper Structure (39 sections, 3 equations, 4 figures, 15 tables)

This paper contains 39 sections, 3 equations, 4 figures, 15 tables.

Introduction
Related Work
Math Word Problem Solving
Chain-of-Thought Prompting Methods
Identify Irrelevant Information
Proposed Approach
Overview
Identify a Set of Irrelevant Condition Candidates
Construct I$^3$C Instruction
Generate Reasoning Paths and Answers with I$^3$C Instruction
I$^3$C-Select: Select Confusing Problems as Automatic Demonstrations
Experiments
Experimental Setup
Datasets.
Baselines.
...and 24 more sections

Figures (4)

Figure 1: The proposed I$^3$C approach instructs LLMs to Identify and Ignore Irrelevant Conditions.
Figure 2: Performance comparison of Complex-CoT, Complex-CoT with I$^3$C instruction (i.e., Complex-CoT+I$^3$C), and Complex-CoT with self-consistency (i.e., Complex-CoT-Self-Consistency). We can observe that the accuracy of Complex-CoT+I$^3$C and Complex-CoT-Self-Consistency is nearly identical, while Complex-CoT+I$^3$C consumes much less tokens and time than Complex-CoT-Self-Consistency.
Figure 3: Demonstration construction methods comparison. "Low" indicates selecting eight problems with the lowest confusion scores. "Medium" indicates randomly selecting eight problems. "High" indicates selecting eight problems with the highest confusion scores.
Figure 4: Hyperparameter analysis. (a) As the threshold increases, the recall scores of identified irrelevant condition candidates first increase and then remain unchanged for all datasets except SingleEq. (b) As the threshold increases, the percentage of conditions to be verified first increases and then remains unchanged for all datasets.

Instructing Large Language Models to Identify and Ignore Irrelevant Conditions

TL;DR

Abstract

Instructing Large Language Models to Identify and Ignore Irrelevant Conditions

Authors

TL;DR

Abstract

Table of Contents

Figures (4)