CoTEVer: Chain of Thought Prompting Annotation Toolkit for Explanation Verification

Seungone Kim; Se June Joo; Yul Jang; Hyungjoo Chae; Jinyoung Yeo

CoTEVer: Chain of Thought Prompting Annotation Toolkit for Explanation Verification

Seungone Kim, Se June Joo, Yul Jang, Hyungjoo Chae, Jinyoung Yeo

TL;DR

CoTEVer addresses the problem of unfaithful chain-of-thought explanations by introducing an annotation toolkit that verifies explanations against retrieved evidence and collects revision data. It combines prompting, evidence retrieval, and annotator verification to produce high-quality, grounded CoT data, enabling downstream CoT fine-tuning and knowledge-intensive task development. The paper also analyzes common explanation errors and outlines practical use cases, including unlikelihood training and fact verification. Public availability of the toolkit suggests potential for broad adoption in improving faithful AI reasoning.

Abstract

Chain-of-thought (CoT) prompting enables large language models (LLMs) to solve complex reasoning tasks by generating an explanation before the final prediction. Despite it's promising ability, a critical downside of CoT prompting is that the performance is greatly affected by the factuality of the generated explanation. To improve the correctness of the explanations, fine-tuning language models with explanation data is needed. However, there exists only a few datasets that can be used for such approaches, and no data collection tool for building them. Thus, we introduce CoTEVer, a tool-kit for annotating the factual correctness of generated explanations and collecting revision data of wrong explanations. Furthermore, we suggest several use cases where the data collected with CoTEVer can be utilized for enhancing the faithfulness of explanations. Our toolkit is publicly available at https://github.com/SeungoneKim/CoTEVer.

CoTEVer: Chain of Thought Prompting Annotation Toolkit for Explanation Verification

TL;DR

Abstract

Paper Structure (17 sections, 3 equations, 4 figures, 4 tables)

This paper contains 17 sections, 3 equations, 4 figures, 4 tables.

Introduction
Related Works
Tool-kits for Data Annotation
Explanation Data
Hallucination in Language Models
System Design and Workflow
S1: Prompting
S2: Evidence Retrieval
S3: Explanation and Answer Verification
Analysis of Explanation Data
How to Utilize Explanation Data gathered with CoTEVer
Chain of Thought Fine-tuning
Knowledge-Intensive Tasks
Conclusion
Link to Video & Code
...and 2 more sections

Figures (4)

Figure 1: Example of Explanation Verification and Answer Verification of GPT-3's output. Explanation Verification requires additional knowledge which makes it hard for annotators to intuitively write a revised explanation and answer.
Figure 2: The overall illustration of CoTEver. An annotator asks a question to CoTEver and receives an explanation, supporting evidence documents, and a prediction. Then, the annotator's rating of the explanation (5 for most relevant), suggestions for a better explanation is stored in the Database which can be used for research purposes.
Figure 3: Snapshot of CoTEVer. Annotator gets to type in a question, and receive the output of a large language model(e.g., GPT-3).
Figure 4: Snapshot of CoTEVer. Annotator could check the retrieved evidence documents in order to verify each step within the explanation.

CoTEVer: Chain of Thought Prompting Annotation Toolkit for Explanation Verification

TL;DR

Abstract

CoTEVer: Chain of Thought Prompting Annotation Toolkit for Explanation Verification

Authors

TL;DR

Abstract

Table of Contents

Figures (4)