Table of Contents
Fetching ...

Misinforming LLMs: vulnerabilities, challenges and opportunities

Bo Zhou, Daniel Geißler, Paul Lukowicz

TL;DR

This paper investigates whether current transformer-based language systems can be trusted to produce accurate and justified information. It argues that LLMs principally rely on correlations in word-embedding spaces rather than genuine cognition, which leads to vulnerabilities such as mis-information and hallucination. The authors critique existing evaluation methods and highlight the susceptibility of prompts and prompt-based jailbreaks to manipulate outputs. They propose a hybrid direction combining retrieval-augmented generation, knowledge graphs, and logic programming to ground outputs in truth and provide explainable reasoning.

Abstract

Large Language Models (LLMs) have made significant advances in natural language processing, but their underlying mechanisms are often misunderstood. Despite exhibiting coherent answers and apparent reasoning behaviors, LLMs rely on statistical patterns in word embeddings rather than true cognitive processes. This leads to vulnerabilities such as "hallucination" and misinformation. The paper argues that current LLM architectures are inherently untrustworthy due to their reliance on correlations of sequential patterns of word embedding vectors. However, ongoing research into combining generative transformer-based models with fact bases and logic programming languages may lead to the development of trustworthy LLMs capable of generating statements based on given truth and explaining their self-reasoning process.

Misinforming LLMs: vulnerabilities, challenges and opportunities

TL;DR

This paper investigates whether current transformer-based language systems can be trusted to produce accurate and justified information. It argues that LLMs principally rely on correlations in word-embedding spaces rather than genuine cognition, which leads to vulnerabilities such as mis-information and hallucination. The authors critique existing evaluation methods and highlight the susceptibility of prompts and prompt-based jailbreaks to manipulate outputs. They propose a hybrid direction combining retrieval-augmented generation, knowledge graphs, and logic programming to ground outputs in truth and provide explainable reasoning.

Abstract

Large Language Models (LLMs) have made significant advances in natural language processing, but their underlying mechanisms are often misunderstood. Despite exhibiting coherent answers and apparent reasoning behaviors, LLMs rely on statistical patterns in word embeddings rather than true cognitive processes. This leads to vulnerabilities such as "hallucination" and misinformation. The paper argues that current LLM architectures are inherently untrustworthy due to their reliance on correlations of sequential patterns of word embedding vectors. However, ongoing research into combining generative transformer-based models with fact bases and logic programming languages may lead to the development of trustworthy LLMs capable of generating statements based on given truth and explaining their self-reasoning process.
Paper Structure (4 sections, 1 figure)

This paper contains 4 sections, 1 figure.

Figures (1)

  • Figure 1: An example of scientific-sounding misinformation (marked red) misleading the LLM (Llama 3 70B 6-bit quantization).