LaMsS: When Large Language Models Meet Self-Skepticism

Yetao Wu; Yihong Wang; Teng Chen; Ningyuan Xi; Qingqing Gu; Hongyang Lei; Luo Ji

LaMsS: When Large Language Models Meet Self-Skepticism

Yetao Wu, Yihong Wang, Teng Chen, Ningyuan Xi, Qingqing Gu, Hongyang Lei, Luo Ji

TL;DR

LaMsS addresses hallucination in large language models by endowing them with self-skepticism through skepticism tokens added to the tokenizer. The method combines continual pre-training on augmented data and supervised finetuning with a two-pass prompting mechanism, enabling the model to output both a response and a skepticism assessment. Empirically, LaMsS achieves state-of-the-art results on MCQ and open-domain QA benchmarks and demonstrates robust generalization to in-domain and out-of-domain tasks, with a threshold epsilon around 0.5 guiding willingness to answer. The approach provides a model-based pathway to more reliable, self-aware LLMs without relying on external knowledge sources.

Abstract

Hallucination is a major challenge for large language models (LLMs), preventing their further application in some fields. The skeptical thinking of humankind could be useful for LLMs to self-cognition, self-reflection and alleviate their hallucinations. Inspired by this consideration, we propose a novel approach called LaMsS, which combines the semantic understanding capability of LLMs with self-skepticism. By introducing a series of skepticism tokens and augmenting them into the vocabulary, we conduct both pertaining and finetuning, which allow the LLM to decode each normal token followed by a skeptical token, representing different skepticism levels. By calculating the response skepticism given a query, one can define a new self-aware LLM which is only willing to answer with relative lower skepticism level than the threshold. By examining the accuracy, AUC and AP of willingly answering questions, we demonstrate that LaMsS achieves better performance than baselines on both multi-choice questions and open-domain question-answering benchmarks, and can generalize to multi-task and out-of-domain settings. Our study sheds some lights on the self-skepticism modeling on further artificial intelligence. Project code and model checkpoints can be found in https://anonymous.4open.science/r/SM-1E76.

LaMsS: When Large Language Models Meet Self-Skepticism

TL;DR

Abstract

Paper Structure (21 sections, 10 equations, 5 figures, 5 tables)

This paper contains 21 sections, 10 equations, 5 figures, 5 tables.

Introduction
Method
Tokenization and Annotation of Skepticism
Stage I: Continual Pre-Training
Stage II: Supervised Finetuning
Stage III: Inference
Experiments
Datasets
Setting
Evaluation
Single-task Results
Multi-task Results
Ablation Study
Sensitivity Study
Typical Cases
...and 6 more sections

Figures (5)

Figure 1: Paradigm of Self-Skepticism by LLM. The emojis represent the self-skepticism levels of the 'formal' tokens by LLM itself. Problematic, counterfactual phrases (e.g., 'pigeon' after 'capital') arouse suspicious and skeptical feelings.
Figure 2: Detailed Framework of LaMsS. Stage I: first learn the plausibility of tokens from pretrained LLM, then continual pretraining on the corpus with vocabulary augmented with skepticism tokens. Stage II: augment the QA pair with the question 'Are you sure/unsure', inference the continual pretrained LLM to answer this augmented question, and finally finetune on these two QA pairs. Stage III: first inference on the finetuned LLM, get the most plausible answer, then concatenate with the augmented question, and inference the second time to obtain the skepticism probability.
Figure 3: Sensitivity Plots of MMLU Metrics as Functions of Skepticism Thresholds $\epsilon$. Left: ID; Right: OOD.
Figure 4: Loss Curves of the CPT stage of LaMsS. Left: the training set; Right: the test set.
Figure 5: Multitask Experimental Precision-Recall Curves on MMLU, with ID and OOD Subsets.

LaMsS: When Large Language Models Meet Self-Skepticism

TL;DR

Abstract

LaMsS: When Large Language Models Meet Self-Skepticism

Authors

TL;DR

Abstract

Table of Contents

Figures (5)