Table of Contents
Fetching ...

A Survey of Hallucination in Large Visual Language Models

Wei Lan, Wenyi Chen, Qingfeng Chen, Shirui Pan, Huiyu Zhou, Yi Pan

TL;DR

The background of LVLMs and hallucinations is introduced, the structure of LVLMs and main causes of hallucination generation are introduced, and some future research directions are suggested to enhance the dependability and utility of LVLMs.

Abstract

The Large Visual Language Models (LVLMs) enhances user interaction and enriches user experience by integrating visual modality on the basis of the Large Language Models (LLMs). It has demonstrated their powerful information processing and generation capabilities. However, the existence of hallucinations has limited the potential and practical effectiveness of LVLM in various fields. Although lots of work has been devoted to the issue of hallucination mitigation and correction, there are few reviews to summary this issue. In this survey, we first introduce the background of LVLMs and hallucinations. Then, the structure of LVLMs and main causes of hallucination generation are introduced. Further, we summary recent works on hallucination correction and mitigation. In addition, the available hallucination evaluation benchmarks for LVLMs are presented from judgmental and generative perspectives. Finally, we suggest some future research directions to enhance the dependability and utility of LVLMs.

A Survey of Hallucination in Large Visual Language Models

TL;DR

The background of LVLMs and hallucinations is introduced, the structure of LVLMs and main causes of hallucination generation are introduced, and some future research directions are suggested to enhance the dependability and utility of LVLMs.

Abstract

The Large Visual Language Models (LVLMs) enhances user interaction and enriches user experience by integrating visual modality on the basis of the Large Language Models (LLMs). It has demonstrated their powerful information processing and generation capabilities. However, the existence of hallucinations has limited the potential and practical effectiveness of LVLM in various fields. Although lots of work has been devoted to the issue of hallucination mitigation and correction, there are few reviews to summary this issue. In this survey, we first introduce the background of LVLMs and hallucinations. Then, the structure of LVLMs and main causes of hallucination generation are introduced. Further, we summary recent works on hallucination correction and mitigation. In addition, the available hallucination evaluation benchmarks for LVLMs are presented from judgmental and generative perspectives. Finally, we suggest some future research directions to enhance the dependability and utility of LVLMs.

Paper Structure

This paper contains 39 sections, 20 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: (A). The framework of LVLM. (B). The examples of hallucinatory phenomena. The red font indicates the hallucinatory part of the LVLMs response.
  • Figure 2: A taxonomy of hallucination correction.
  • Figure 3: The framework of HalluciDoctor.
  • Figure 4: The framework of COMM.
  • Figure 5: The framework of DualFocus. $Q_2$ is adapted from $Q_1$.
  • ...and 4 more figures